This dataset comprises information of 25000 mutual funds in the United states. Various attributes related to the mutual fund have been described and these attributes will be used for making decisions on the rating of the mutual fund by GreatStone which is a top mutual fund rating agency. The following files are provided in the form of CSVs. These files contain various attributes related to the mutual fund. Please find the following files for the same: bond_ratings, fund_allocations, fund_config, fund_ratios, fundspecs, other specs, return_3year, return_5year, return_10year.
Mutual Fund - Finance
The goal of this hackathon is to predict GreatStone’s rating of a mutual fund. In order to help investors decide on which mutual fund to pick for an investment, the task is to build a model that can predict the rating of a mutual fund. The various attributes that define a mutual fund can be used for building the model
### Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import pandas_profiling
sns.set(rc={'figure.figsize':(13.7,8.27)}) # setting constant to increase seaborn graph sizes
# Read the data as a data frame
#bond_ratings consists of 12 columns which provide information on the bond rating percentage allocation of the mutual funds
#The tag column is a unique identifier and is also the same as the id.(i.e tag = id)
bond_ratings = pd.read_csv('Hackathon_Files/external/bond_ratings.csv')
pandas_profiling.ProfileReport(bond_ratings)
Dataset info
| Number of variables | 12 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 11.0% |
| Total size in memory | 2.3 MiB |
| Average record size in memory | 96.0 B |
Variables types
| Numeric | 11 |
|---|---|
| Categorical | 0 |
| Boolean | 1 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 0 |
| Unsupported | 0 |
Warnings
a_rating has 16262 / 65.0% zeros Zerosaa_rating has 16499 / 66.0% zeros Zerosaaa_rating has 15780 / 63.1% zeros Zerosb_rating has 17727 / 70.9% zeros Zerosbb_rating has 16658 / 66.6% zeros Zerosbbb_rating has 15797 / 63.2% zeros Zerosbelow_b_rating has 18160 / 72.6% zeros Zerosduration_bond has 15126 / 60.5% missing values Missingmaturity_bond has 16907 / 67.6% missing values Missingothers_rating has 16602 / 66.4% zeros Zerosa_rating
Numeric
| Distinct count | 1582 |
|---|---|
| Unique (%) | 6.3% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 5.0544 |
|---|---|
| Minimum | 0 |
| Maximum | 72.87 |
| Zeros (%) | 65.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 8.34 |
| 95-th percentile | 25.7 |
| Maximum | 72.87 |
| Range | 72.87 |
| Interquartile range | 8.34 |
Descriptive statistics
| Standard deviation | 9.2618 |
|---|---|
| Coef of variation | 1.8324 |
| Kurtosis | 5.8703 |
| Mean | 5.0544 |
| MAD | 6.8295 |
| Skewness | 2.2998 |
| Sum | 125780 |
| Variance | 85.781 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 16262 | 65.0% |
|
| 10.5 | 49 | 0.2% |
|
| 11.98 | 44 | 0.2% |
|
| 10.84 | 33 | 0.1% |
|
| 11.48 | 33 | 0.1% |
|
| 8.21 | 28 | 0.1% |
|
| 5.64 | 26 | 0.1% |
|
| 11.56 | 25 | 0.1% |
|
| 4.43 | 24 | 0.1% |
|
| 12.42 | 24 | 0.1% |
|
| Other values (1571) | 8338 | 33.4% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 16262 | 65.0% |
|
| 0.01 | 7 | 0.0% |
|
| 0.02 | 10 | 0.0% |
|
| 0.04 | 6 | 0.0% |
|
| 0.05 | 3 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 59.94 | 3 | 0.0% |
|
| 60.42 | 2 | 0.0% |
|
| 60.93 | 7 | 0.0% |
|
| 66.64 | 3 | 0.0% |
|
| 72.87 | 1 | 0.0% |
|
aa_rating
Numeric
| Distinct count | 1404 |
|---|---|
| Unique (%) | 5.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4.2091 |
|---|---|
| Minimum | -0.19 |
| Maximum | 90.22 |
| Zeros (%) | 66.0% |
Quantile statistics
| Minimum | -0.19 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 3.01 |
| 95-th percentile | 30.282 |
| Maximum | 90.22 |
| Range | 90.41 |
| Interquartile range | 3.01 |
Descriptive statistics
| Standard deviation | 11.165 |
|---|---|
| Coef of variation | 2.6525 |
| Kurtosis | 14.38 |
| Mean | 4.2091 |
| MAD | 6.0853 |
| Skewness | 3.7139 |
| Sum | 104750 |
| Variance | 124.65 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 16499 | 66.0% |
|
| 3.94 | 54 | 0.2% |
|
| 1.5 | 53 | 0.2% |
|
| 3.04 | 49 | 0.2% |
|
| 5.05 | 47 | 0.2% |
|
| 1.66 | 42 | 0.2% |
|
| 3.65 | 41 | 0.2% |
|
| 0.6 | 38 | 0.2% |
|
| 3.33 | 38 | 0.2% |
|
| 4.68 | 38 | 0.2% |
|
| Other values (1393) | 7987 | 31.9% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.19 | 7 | 0.0% |
|
| -0.02 | 3 | 0.0% |
|
| -0.01 | 1 | 0.0% |
|
| 0.0 | 16499 | 66.0% |
|
| 0.01 | 8 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 80.42 | 1 | 0.0% |
|
| 81.36 | 7 | 0.0% |
|
| 84.39 | 1 | 0.0% |
|
| 85.68 | 1 | 0.0% |
|
| 90.22 | 1 | 0.0% |
|
aaa_rating
Numeric
| Distinct count | 2008 |
|---|---|
| Unique (%) | 8.0% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 14.558 |
|---|---|
| Minimum | -3.15 |
| Maximum | 118.65 |
| Zeros (%) | 63.1% |
Quantile statistics
| Minimum | -3.15 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 18.955 |
| 95-th percentile | 72.31 |
| Maximum | 118.65 |
| Range | 121.8 |
| Interquartile range | 18.955 |
Descriptive statistics
| Standard deviation | 25.637 |
|---|---|
| Coef of variation | 1.761 |
| Kurtosis | 1.7023 |
| Mean | 14.558 |
| MAD | 20.163 |
| Skewness | 1.6867 |
| Sum | 362300 |
| Variance | 657.25 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 15780 | 63.1% |
|
| 100.0 | 221 | 0.9% |
|
| 53.38 | 34 | 0.1% |
|
| 73.18 | 34 | 0.1% |
|
| 72.2 | 29 | 0.1% |
|
| 99.99 | 26 | 0.1% |
|
| 1.4 | 25 | 0.1% |
|
| 31.83 | 24 | 0.1% |
|
| 87.93 | 24 | 0.1% |
|
| 12.3 | 24 | 0.1% |
|
| Other values (1997) | 8665 | 34.7% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -3.15 | 1 | 0.0% |
|
| -0.87 | 3 | 0.0% |
|
| -0.66 | 1 | 0.0% |
|
| -0.41 | 7 | 0.0% |
|
| -0.38 | 3 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 100.0 | 221 | 0.9% |
|
| 100.06 | 5 | 0.0% |
|
| 100.45 | 2 | 0.0% |
|
| 105.51 | 5 | 0.0% |
|
| 118.65 | 1 | 0.0% |
|
b_rating
Numeric
| Distinct count | 1152 |
|---|---|
| Unique (%) | 4.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 3.2344 |
|---|---|
| Minimum | -0.12 |
| Maximum | 80.68 |
| Zeros (%) | 70.9% |
Quantile statistics
| Minimum | -0.12 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.71 |
| 95-th percentile | 21.032 |
| Maximum | 80.68 |
| Range | 80.8 |
| Interquartile range | 0.71 |
Descriptive statistics
| Standard deviation | 9.1972 |
|---|---|
| Coef of variation | 2.8435 |
| Kurtosis | 16.97 |
| Mean | 3.2344 |
| MAD | 5.012 |
| Skewness | 3.9549 |
| Sum | 80491 |
| Variance | 84.588 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 17727 | 70.9% |
|
| 0.43 | 55 | 0.2% |
|
| 0.32 | 51 | 0.2% |
|
| 0.01 | 38 | 0.2% |
|
| 0.71 | 33 | 0.1% |
|
| 0.7 | 33 | 0.1% |
|
| 0.74 | 30 | 0.1% |
|
| 0.39 | 29 | 0.1% |
|
| 1.2 | 28 | 0.1% |
|
| 3.44 | 28 | 0.1% |
|
| Other values (1141) | 6834 | 27.3% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.12 | 7 | 0.0% |
|
| 0.0 | 17727 | 70.9% |
|
| 0.01 | 38 | 0.2% |
|
| 0.02 | 17 | 0.1% |
|
| 0.03 | 16 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 68.76 | 4 | 0.0% |
|
| 72.35 | 4 | 0.0% |
|
| 73.88 | 6 | 0.0% |
|
| 77.31 | 4 | 0.0% |
|
| 80.68 | 6 | 0.0% |
|
bb_rating
Numeric
| Distinct count | 1272 |
|---|---|
| Unique (%) | 5.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 3.4738 |
|---|---|
| Minimum | 0 |
| Maximum | 80.47 |
| Zeros (%) | 66.6% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 2.45 |
| 95-th percentile | 22.01 |
| Maximum | 80.47 |
| Range | 80.47 |
| Interquartile range | 2.45 |
Descriptive statistics
| Standard deviation | 8.2997 |
|---|---|
| Coef of variation | 2.3892 |
| Kurtosis | 12.986 |
| Mean | 3.4738 |
| MAD | 5.0468 |
| Skewness | 3.3951 |
| Sum | 86449 |
| Variance | 68.886 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 16658 | 66.6% |
|
| 0.03 | 79 | 0.3% |
|
| 0.02 | 49 | 0.2% |
|
| 1.69 | 43 | 0.2% |
|
| 1.06 | 41 | 0.2% |
|
| 4.5 | 36 | 0.1% |
|
| 0.83 | 34 | 0.1% |
|
| 4.08 | 31 | 0.1% |
|
| 3.18 | 31 | 0.1% |
|
| 6.51 | 29 | 0.1% |
|
| Other values (1261) | 7855 | 31.4% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 16658 | 66.6% |
|
| 0.01 | 2 | 0.0% |
|
| 0.02 | 49 | 0.2% |
|
| 0.03 | 79 | 0.3% |
|
| 0.04 | 19 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 67.44 | 2 | 0.0% |
|
| 67.89 | 3 | 0.0% |
|
| 69.26 | 4 | 0.0% |
|
| 72.46 | 1 | 0.0% |
|
| 80.47 | 1 | 0.0% |
|
bbb_rating
Numeric
| Distinct count | 1645 |
|---|---|
| Unique (%) | 6.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 6.1263 |
|---|---|
| Minimum | 0 |
| Maximum | 98 |
| Zeros (%) | 63.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 11.39 |
| 95-th percentile | 27.44 |
| Maximum | 98 |
| Range | 98 |
| Interquartile range | 11.39 |
Descriptive statistics
| Standard deviation | 10.598 |
|---|---|
| Coef of variation | 1.7299 |
| Kurtosis | 6.5009 |
| Mean | 6.1263 |
| MAD | 8.0958 |
| Skewness | 2.2379 |
| Sum | 152460 |
| Variance | 112.32 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 15797 | 63.2% |
|
| 11.64 | 38 | 0.2% |
|
| 4.3 | 36 | 0.1% |
|
| 14.63 | 35 | 0.1% |
|
| 12.96 | 33 | 0.1% |
|
| 18.33 | 33 | 0.1% |
|
| 18.82 | 33 | 0.1% |
|
| 22.1 | 31 | 0.1% |
|
| 13.99 | 30 | 0.1% |
|
| 12.0 | 30 | 0.1% |
|
| Other values (1634) | 8790 | 35.2% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 15797 | 63.2% |
|
| 0.01 | 1 | 0.0% |
|
| 0.02 | 3 | 0.0% |
|
| 0.03 | 5 | 0.0% |
|
| 0.05 | 5 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 71.56 | 10 | 0.0% |
|
| 73.5 | 2 | 0.0% |
|
| 78.29 | 6 | 0.0% |
|
| 84.0 | 6 | 0.0% |
|
| 98.0 | 1 | 0.0% |
|
below_b_rating
Numeric
| Distinct count | 659 |
|---|---|
| Unique (%) | 2.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.82752 |
|---|---|
| Minimum | -0.02 |
| Maximum | 42.3 |
| Zeros (%) | 72.6% |
Quantile statistics
| Minimum | -0.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.1 |
| 95-th percentile | 4.72 |
| Maximum | 42.3 |
| Range | 42.32 |
| Interquartile range | 0.1 |
Descriptive statistics
| Standard deviation | 2.7 |
|---|---|
| Coef of variation | 3.2628 |
| Kurtosis | 51.951 |
| Mean | 0.82752 |
| MAD | 1.2977 |
| Skewness | 6.1467 |
| Sum | 20594 |
| Variance | 7.2901 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 18160 | 72.6% |
|
| 0.03 | 102 | 0.4% |
|
| 0.01 | 91 | 0.4% |
|
| 0.1 | 64 | 0.3% |
|
| 0.09 | 61 | 0.2% |
|
| 0.75 | 59 | 0.2% |
|
| 0.17 | 58 | 0.2% |
|
| 0.27 | 57 | 0.2% |
|
| 0.14 | 57 | 0.2% |
|
| 0.2 | 54 | 0.2% |
|
| Other values (648) | 6123 | 24.5% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 5 | 0.0% |
|
| 0.0 | 18160 | 72.6% |
|
| 0.01 | 91 | 0.4% |
|
| 0.02 | 48 | 0.2% |
|
| 0.03 | 102 | 0.4% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 34.3 | 9 | 0.0% |
|
| 35.91 | 4 | 0.0% |
|
| 38.83 | 3 | 0.0% |
|
| 39.0 | 1 | 0.0% |
|
| 42.3 | 3 | 0.0% |
|
duration_bond
Numeric
| Distinct count | 799 |
|---|---|
| Unique (%) | 3.2% |
| Missing (%) | 60.5% |
| Missing (n) | 15126 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4.6431 |
|---|---|
| Minimum | -3.01 |
| Maximum | 25 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -3.01 |
|---|---|
| 5-th percentile | 0.7165 |
| Q1 | 3.5 |
| Median | 4.8 |
| Q3 | 5.76 |
| 95-th percentile | 7.49 |
| Maximum | 25 |
| Range | 28.01 |
| Interquartile range | 2.26 |
Descriptive statistics
| Standard deviation | 2.2671 |
|---|---|
| Coef of variation | 0.48828 |
| Kurtosis | 7.2024 |
| Mean | 4.6431 |
| MAD | 1.5911 |
| Skewness | 1.093 |
| Sum | 45846 |
| Variance | 5.1398 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 5.39 | 83 | 0.3% |
|
| 5.8 | 75 | 0.3% |
|
| 3.04 | 56 | 0.2% |
|
| 5.57 | 55 | 0.2% |
|
| 5.77 | 53 | 0.2% |
|
| 4.44 | 53 | 0.2% |
|
| 5.6 | 53 | 0.2% |
|
| 4.72 | 50 | 0.2% |
|
| 5.32 | 49 | 0.2% |
|
| 4.84 | 48 | 0.2% |
|
| Other values (788) | 9299 | 37.2% |
|
| (Missing) | 15126 | 60.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -3.01 | 6 | 0.0% |
|
| -2.91 | 4 | 0.0% |
|
| -1.97 | 6 | 0.0% |
|
| -1.61 | 5 | 0.0% |
|
| -1.6 | 5 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 18.54 | 2 | 0.0% |
|
| 21.96 | 1 | 0.0% |
|
| 24.04 | 2 | 0.0% |
|
| 24.25 | 1 | 0.0% |
|
| 25.0 | 2 | 0.0% |
|
maturity_bond
Numeric
| Distinct count | 1045 |
|---|---|
| Unique (%) | 4.2% |
| Missing (%) | 67.6% |
| Missing (n) | 16907 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 7.7654 |
|---|---|
| Minimum | 0 |
| Maximum | 29.3 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.212 |
| Q1 | 5.46 |
| Median | 7.29 |
| Q3 | 8.92 |
| 95-th percentile | 17.044 |
| Maximum | 29.3 |
| Range | 29.3 |
| Interquartile range | 3.46 |
Descriptive statistics
| Standard deviation | 4.1486 |
|---|---|
| Coef of variation | 0.53423 |
| Kurtosis | 2.7326 |
| Mean | 7.7654 |
| MAD | 2.8679 |
| Skewness | 1.3526 |
| Sum | 62846 |
| Variance | 17.211 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 7.99 | 60 | 0.2% |
|
| 8.33 | 43 | 0.2% |
|
| 5.6 | 42 | 0.2% |
|
| 8.0 | 42 | 0.2% |
|
| 7.17 | 37 | 0.1% |
|
| 10.78 | 37 | 0.1% |
|
| 7.34 | 36 | 0.1% |
|
| 7.05 | 36 | 0.1% |
|
| 7.39 | 35 | 0.1% |
|
| 5.71 | 35 | 0.1% |
|
| Other values (1034) | 7690 | 30.8% |
|
| (Missing) | 16907 | 67.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 15 | 0.1% |
|
| 0.01 | 5 | 0.0% |
|
| 0.07 | 2 | 0.0% |
|
| 0.12 | 7 | 0.0% |
|
| 0.14 | 7 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 25.51 | 3 | 0.0% |
|
| 26.29 | 1 | 0.0% |
|
| 27.12 | 5 | 0.0% |
|
| 27.79 | 1 | 0.0% |
|
| 29.3 | 2 | 0.0% |
|
others_rating
Numeric
| Distinct count | 992 |
|---|---|
| Unique (%) | 4.0% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.6668 |
|---|---|
| Minimum | -68.21 |
| Maximum | 100 |
| Zeros (%) | 66.4% |
Quantile statistics
| Minimum | -68.21 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.33 |
| 95-th percentile | 8.0075 |
| Maximum | 100 |
| Range | 168.21 |
| Interquartile range | 0.33 |
Descriptive statistics
| Standard deviation | 6.8852 |
|---|---|
| Coef of variation | 4.1308 |
| Kurtosis | 88.788 |
| Mean | 1.6668 |
| MAD | 2.7103 |
| Skewness | 8.1244 |
| Sum | 41479 |
| Variance | 47.405 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 16602 | 66.4% |
|
| 0.09 | 131 | 0.5% |
|
| 0.01 | 131 | 0.5% |
|
| 0.05 | 112 | 0.4% |
|
| 0.1 | 86 | 0.3% |
|
| 0.06 | 79 | 0.3% |
|
| 0.08 | 77 | 0.3% |
|
| 0.16 | 76 | 0.3% |
|
| 0.03 | 73 | 0.3% |
|
| 1.0 | 70 | 0.3% |
|
| Other values (981) | 7449 | 29.8% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -68.21 | 1 | 0.0% |
|
| -49.55 | 5 | 0.0% |
|
| -19.82 | 3 | 0.0% |
|
| -18.65 | 1 | 0.0% |
|
| -18.08 | 8 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 83.4 | 9 | 0.0% |
|
| 92.81 | 5 | 0.0% |
|
| 95.67 | 8 | 0.0% |
|
| 99.74 | 5 | 0.0% |
|
| 100.0 | 14 | 0.1% |
|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
us_govt_bond_rating
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Mean | 0 |
|---|
| 0.0 |
24886
|
|---|---|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0.0 | 24886 | 99.5% |
|
| (Missing) | 114 | 0.5% |
|
| bb_rating | us_govt_bond_rating | below_b_rating | others_rating | maturity_bond | b_rating | tag | a_rating | aaa_rating | aa_rating | bbb_rating | duration_bond | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 67922 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 1 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 134783 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 2 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 61271 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 3 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 64412 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
| 4 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 184058 | 0.0 | 0.0 | 0.0 | 0.0 | NaN |
#fund_allocations consists of 12 columns which provide information on the sector wise percentage allocation of the mutual funds
fund_allocations = pd.read_csv('Hackathon_Files/external/fund_allocations.csv')
pandas_profiling.ProfileReport(fund_allocations)
Dataset info
| Number of variables | 12 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 0.4% |
| Total size in memory | 2.3 MiB |
| Average record size in memory | 96.0 B |
Variables types
| Numeric | 12 |
|---|---|
| Categorical | 0 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 0 |
| Unsupported | 0 |
Warnings
portfolio_communication_allocation has 10266 / 41.1% zeros Zerosportfolio_consumer_defence_allocation has 8833 / 35.3% zeros Zerosportfolio_cyclical_consumer_allocation has 7497 / 30.0% zeros Zerosportfolio_energy_allocation has 8909 / 35.6% zeros Zerosportfolio_financial_services has 8039 / 32.2% zeros Zerosportfolio_healthcare_allocation has 8385 / 33.5% zeros Zerosportfolio_industrials_allocation has 7368 / 29.5% zeros Zerosportfolio_materials_basic_allocation has 8753 / 35.0% zeros Zerosportfolio_property_allocation has 10196 / 40.8% zeros Zerosportfolio_tech_allocation has 7901 / 31.6% zeros Zerosportfolio_utils_allocation has 11754 / 47.0% zeros Zerosid
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
portfolio_communication_allocation
Numeric
| Distinct count | 982 |
|---|---|
| Unique (%) | 3.9% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.2723 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 41.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 1.18 |
| Q3 | 3.41 |
| 95-th percentile | 7.08 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 3.41 |
Descriptive statistics
| Standard deviation | 4.4046 |
|---|---|
| Coef of variation | 1.9384 |
| Kurtosis | 187.33 |
| Mean | 2.2723 |
| MAD | 2.2526 |
| Skewness | 10.591 |
| Sum | 56548 |
| Variance | 19.401 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 10266 | 41.1% |
|
| 3.5 | 99 | 0.4% |
|
| 3.49 | 98 | 0.4% |
|
| 2.42 | 94 | 0.4% |
|
| 2.45 | 89 | 0.4% |
|
| 3.41 | 85 | 0.3% |
|
| 3.56 | 78 | 0.3% |
|
| 2.52 | 74 | 0.3% |
|
| 3.0 | 73 | 0.3% |
|
| 3.57 | 71 | 0.3% |
|
| Other values (971) | 13859 | 55.4% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 10266 | 41.1% |
|
| 0.01 | 71 | 0.3% |
|
| 0.02 | 60 | 0.2% |
|
| 0.03 | 50 | 0.2% |
|
| 0.04 | 14 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 54.88 | 1 | 0.0% |
|
| 58.81 | 4 | 0.0% |
|
| 80.84 | 3 | 0.0% |
|
| 93.31 | 5 | 0.0% |
|
| 100.0 | 12 | 0.0% |
|
portfolio_consumer_defence_allocation
Numeric
| Distinct count | 1555 |
|---|---|
| Unique (%) | 6.2% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 5.1113 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 35.3% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 4.84 |
| Q3 | 7.88 |
| 95-th percentile | 13.77 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 7.88 |
Descriptive statistics
| Standard deviation | 6.0785 |
|---|---|
| Coef of variation | 1.1892 |
| Kurtosis | 54.598 |
| Mean | 5.1113 |
| MAD | 4.2654 |
| Skewness | 4.6174 |
| Sum | 127200 |
| Variance | 36.948 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8833 | 35.3% |
|
| 7.76 | 79 | 0.3% |
|
| 7.95 | 77 | 0.3% |
|
| 7.67 | 71 | 0.3% |
|
| 7.56 | 67 | 0.3% |
|
| 7.87 | 67 | 0.3% |
|
| 6.64 | 64 | 0.3% |
|
| 8.45 | 61 | 0.2% |
|
| 6.61 | 60 | 0.2% |
|
| 7.78 | 60 | 0.2% |
|
| Other values (1544) | 15447 | 61.8% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 8833 | 35.3% |
|
| 0.01 | 39 | 0.2% |
|
| 0.02 | 4 | 0.0% |
|
| 0.03 | 6 | 0.0% |
|
| 0.04 | 17 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 94.66 | 5 | 0.0% |
|
| 98.05 | 4 | 0.0% |
|
| 98.75 | 1 | 0.0% |
|
| 99.91 | 7 | 0.0% |
|
| 100.0 | 3 | 0.0% |
|
portfolio_cyclical_consumer_allocation
Numeric
| Distinct count | 1975 |
|---|---|
| Unique (%) | 7.9% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.2116 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 30.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 10.46 |
| Q3 | 13.21 |
| 95-th percentile | 21.24 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 13.21 |
Descriptive statistics
| Standard deviation | 9.701 |
|---|---|
| Coef of variation | 1.0531 |
| Kurtosis | 28.534 |
| Mean | 9.2116 |
| MAD | 6.6222 |
| Skewness | 3.6464 |
| Sum | 229240 |
| Variance | 94.109 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 7497 | 30.0% |
|
| 11.84 | 89 | 0.4% |
|
| 100.0 | 75 | 0.3% |
|
| 11.54 | 67 | 0.3% |
|
| 12.89 | 61 | 0.2% |
|
| 11.55 | 55 | 0.2% |
|
| 11.82 | 53 | 0.2% |
|
| 11.73 | 53 | 0.2% |
|
| 11.9 | 51 | 0.2% |
|
| 11.07 | 51 | 0.2% |
|
| Other values (1964) | 16834 | 67.3% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 7497 | 30.0% |
|
| 0.01 | 9 | 0.0% |
|
| 0.02 | 4 | 0.0% |
|
| 0.03 | 6 | 0.0% |
|
| 0.04 | 6 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 93.23 | 3 | 0.0% |
|
| 93.38 | 1 | 0.0% |
|
| 95.24 | 1 | 0.0% |
|
| 99.76 | 1 | 0.0% |
|
| 100.0 | 75 | 0.3% |
|
portfolio_energy_allocation
Numeric
| Distinct count | 1459 |
|---|---|
| Unique (%) | 5.8% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 5.8266 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 35.6% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 3.38 |
| Q3 | 6.25 |
| 95-th percentile | 14.05 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 6.25 |
Descriptive statistics
| Standard deviation | 13.687 |
|---|---|
| Coef of variation | 2.3491 |
| Kurtosis | 33.696 |
| Mean | 5.8266 |
| MAD | 5.7585 |
| Skewness | 5.6142 |
| Sum | 145000 |
| Variance | 187.34 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8909 | 35.6% |
|
| 100.0 | 240 | 1.0% |
|
| 5.42 | 89 | 0.4% |
|
| 5.43 | 87 | 0.3% |
|
| 5.4 | 85 | 0.3% |
|
| 5.39 | 67 | 0.3% |
|
| 5.88 | 66 | 0.3% |
|
| 5.27 | 64 | 0.3% |
|
| 5.99 | 61 | 0.2% |
|
| 6.07 | 60 | 0.2% |
|
| Other values (1448) | 15158 | 60.6% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 8909 | 35.6% |
|
| 0.01 | 30 | 0.1% |
|
| 0.02 | 11 | 0.0% |
|
| 0.03 | 1 | 0.0% |
|
| 0.04 | 6 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.13 | 1 | 0.0% |
|
| 99.3 | 1 | 0.0% |
|
| 99.9 | 3 | 0.0% |
|
| 99.99 | 6 | 0.0% |
|
| 100.0 | 240 | 1.0% |
|
portfolio_financial_services
Numeric
| Distinct count | 2287 |
|---|---|
| Unique (%) | 9.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 11.838 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 32.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 13.12 |
| Q3 | 17.91 |
| 95-th percentile | 27.49 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 17.91 |
Descriptive statistics
| Standard deviation | 12.286 |
|---|---|
| Coef of variation | 1.0379 |
| Kurtosis | 16.103 |
| Mean | 11.838 |
| MAD | 8.9789 |
| Skewness | 2.6968 |
| Sum | 294600 |
| Variance | 150.96 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8039 | 32.2% |
|
| 100.0 | 107 | 0.4% |
|
| 13.38 | 71 | 0.3% |
|
| 15.63 | 53 | 0.2% |
|
| 17.4 | 50 | 0.2% |
|
| 16.79 | 49 | 0.2% |
|
| 15.39 | 45 | 0.2% |
|
| 16.02 | 44 | 0.2% |
|
| 17.92 | 44 | 0.2% |
|
| 17.56 | 43 | 0.2% |
|
| Other values (2276) | 16341 | 65.4% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 8039 | 32.2% |
|
| 0.01 | 3 | 0.0% |
|
| 0.02 | 9 | 0.0% |
|
| 0.03 | 1 | 0.0% |
|
| 0.04 | 3 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.48 | 1 | 0.0% |
|
| 99.5 | 1 | 0.0% |
|
| 99.92 | 5 | 0.0% |
|
| 99.95 | 5 | 0.0% |
|
| 100.0 | 107 | 0.4% |
|
portfolio_healthcare_allocation
Numeric
| Distinct count | 1965 |
|---|---|
| Unique (%) | 7.9% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 8.5369 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 33.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 9.37 |
| Q3 | 13.56 |
| 95-th percentile | 19.8 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 13.56 |
Descriptive statistics
| Standard deviation | 9.6185 |
|---|---|
| Coef of variation | 1.1267 |
| Kurtosis | 31.672 |
| Mean | 8.5369 |
| MAD | 6.8587 |
| Skewness | 3.8548 |
| Sum | 212450 |
| Variance | 92.515 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8385 | 33.5% |
|
| 12.12 | 63 | 0.3% |
|
| 12.63 | 60 | 0.2% |
|
| 14.5 | 56 | 0.2% |
|
| 12.42 | 53 | 0.2% |
|
| 14.49 | 53 | 0.2% |
|
| 11.16 | 47 | 0.2% |
|
| 11.17 | 47 | 0.2% |
|
| 12.49 | 45 | 0.2% |
|
| 16.13 | 45 | 0.2% |
|
| Other values (1954) | 16032 | 64.1% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 8385 | 33.5% |
|
| 0.01 | 24 | 0.1% |
|
| 0.02 | 12 | 0.0% |
|
| 0.03 | 5 | 0.0% |
|
| 0.04 | 13 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.66 | 1 | 0.0% |
|
| 99.71 | 2 | 0.0% |
|
| 99.78 | 1 | 0.0% |
|
| 99.92 | 4 | 0.0% |
|
| 100.0 | 21 | 0.1% |
|
portfolio_industrials_allocation
Numeric
| Distinct count | 2037 |
|---|---|
| Unique (%) | 8.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.056 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 29.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 9.57 |
| Q3 | 12.75 |
| 95-th percentile | 21.81 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 12.75 |
Descriptive statistics
| Standard deviation | 10.17 |
|---|---|
| Coef of variation | 1.1231 |
| Kurtosis | 28.854 |
| Mean | 9.056 |
| MAD | 6.5424 |
| Skewness | 3.9358 |
| Sum | 225370 |
| Variance | 103.44 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 7368 | 29.5% |
|
| 10.23 | 69 | 0.3% |
|
| 100.0 | 67 | 0.3% |
|
| 10.31 | 64 | 0.3% |
|
| 11.02 | 61 | 0.2% |
|
| 10.18 | 52 | 0.2% |
|
| 10.97 | 50 | 0.2% |
|
| 10.63 | 48 | 0.2% |
|
| 11.09 | 45 | 0.2% |
|
| 11.49 | 45 | 0.2% |
|
| Other values (2026) | 17017 | 68.1% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 7368 | 29.5% |
|
| 0.01 | 17 | 0.1% |
|
| 0.02 | 15 | 0.1% |
|
| 0.03 | 15 | 0.1% |
|
| 0.04 | 2 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 98.74 | 1 | 0.0% |
|
| 99.08 | 1 | 0.0% |
|
| 99.42 | 1 | 0.0% |
|
| 99.7 | 1 | 0.0% |
|
| 100.0 | 67 | 0.3% |
|
portfolio_materials_basic_allocation
Numeric
| Distinct count | 1302 |
|---|---|
| Unique (%) | 5.2% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 3.8983 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 35.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 2.79 |
| Q3 | 5.06 |
| 95-th percentile | 10.207 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 5.06 |
Descriptive statistics
| Standard deviation | 8.1363 |
|---|---|
| Coef of variation | 2.0872 |
| Kurtosis | 89.961 |
| Mean | 3.8983 |
| MAD | 3.5053 |
| Skewness | 8.4998 |
| Sum | 97012 |
| Variance | 66.2 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8753 | 35.0% |
|
| 4.3 | 97 | 0.4% |
|
| 4.46 | 81 | 0.3% |
|
| 100.0 | 77 | 0.3% |
|
| 4.5 | 74 | 0.3% |
|
| 2.46 | 67 | 0.3% |
|
| 4.63 | 64 | 0.3% |
|
| 5.02 | 61 | 0.2% |
|
| 4.47 | 60 | 0.2% |
|
| 4.38 | 60 | 0.2% |
|
| Other values (1291) | 15492 | 62.0% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 8753 | 35.0% |
|
| 0.01 | 29 | 0.1% |
|
| 0.02 | 12 | 0.0% |
|
| 0.03 | 11 | 0.0% |
|
| 0.04 | 16 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.19 | 5 | 0.0% |
|
| 99.81 | 1 | 0.0% |
|
| 99.92 | 1 | 0.0% |
|
| 99.99 | 5 | 0.0% |
|
| 100.0 | 77 | 0.3% |
|
portfolio_property_allocation
Numeric
| Distinct count | 1403 |
|---|---|
| Unique (%) | 5.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4.9264 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 40.8% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 1.55 |
| Q3 | 4.44 |
| 95-th percentile | 13.58 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 4.44 |
Descriptive statistics
| Standard deviation | 13.855 |
|---|---|
| Coef of variation | 2.8125 |
| Kurtosis | 33.725 |
| Mean | 4.9264 |
| MAD | 5.7902 |
| Skewness | 5.6824 |
| Sum | 122600 |
| Variance | 191.97 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 10196 | 40.8% |
|
| 2.5 | 75 | 0.3% |
|
| 3.73 | 66 | 0.3% |
|
| 100.0 | 65 | 0.3% |
|
| 2.51 | 62 | 0.2% |
|
| 2.44 | 61 | 0.2% |
|
| 3.92 | 61 | 0.2% |
|
| 1.47 | 54 | 0.2% |
|
| 1.79 | 53 | 0.2% |
|
| 4.26 | 51 | 0.2% |
|
| Other values (1392) | 14142 | 56.6% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 10196 | 40.8% |
|
| 0.01 | 35 | 0.1% |
|
| 0.02 | 33 | 0.1% |
|
| 0.03 | 26 | 0.1% |
|
| 0.04 | 17 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.88 | 1 | 0.0% |
|
| 99.89 | 5 | 0.0% |
|
| 99.91 | 2 | 0.0% |
|
| 99.92 | 5 | 0.0% |
|
| 100.0 | 65 | 0.3% |
|
portfolio_tech_allocation
Numeric
| Distinct count | 2592 |
|---|---|
| Unique (%) | 10.4% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 12.78 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 31.6% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 12.885 |
| Q3 | 19.65 |
| 95-th percentile | 33.35 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 19.65 |
Descriptive statistics
| Standard deviation | 12.558 |
|---|---|
| Coef of variation | 0.98264 |
| Kurtosis | 5.6537 |
| Mean | 12.78 |
| MAD | 9.8613 |
| Skewness | 1.51 |
| Sum | 318040 |
| Variance | 157.71 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 7901 | 31.6% |
|
| 19.2 | 64 | 0.3% |
|
| 17.67 | 58 | 0.2% |
|
| 18.06 | 47 | 0.2% |
|
| 19.13 | 46 | 0.2% |
|
| 22.94 | 43 | 0.2% |
|
| 18.02 | 42 | 0.2% |
|
| 16.9 | 41 | 0.2% |
|
| 21.92 | 41 | 0.2% |
|
| 17.87 | 41 | 0.2% |
|
| Other values (2581) | 16562 | 66.2% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 7901 | 31.6% |
|
| 0.01 | 15 | 0.1% |
|
| 0.02 | 2 | 0.0% |
|
| 0.03 | 1 | 0.0% |
|
| 0.06 | 6 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 97.64 | 4 | 0.0% |
|
| 97.81 | 1 | 0.0% |
|
| 98.54 | 2 | 0.0% |
|
| 99.31 | 2 | 0.0% |
|
| 100.0 | 19 | 0.1% |
|
portfolio_utils_allocation
Numeric
| Distinct count | 1034 |
|---|---|
| Unique (%) | 4.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.7618 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 47.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0.43 |
| Q3 | 3.34 |
| 95-th percentile | 8.1 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 3.34 |
Descriptive statistics
| Standard deviation | 7.5944 |
|---|---|
| Coef of variation | 2.7498 |
| Kurtosis | 87.457 |
| Mean | 2.7618 |
| MAD | 3.0692 |
| Skewness | 8.413 |
| Sum | 68730 |
| Variance | 57.675 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 11754 | 47.0% |
|
| 3.34 | 113 | 0.5% |
|
| 3.33 | 91 | 0.4% |
|
| 3.2 | 79 | 0.3% |
|
| 3.21 | 77 | 0.3% |
|
| 3.35 | 75 | 0.3% |
|
| 2.86 | 67 | 0.3% |
|
| 3.15 | 65 | 0.3% |
|
| 0.01 | 62 | 0.2% |
|
| 4.18 | 62 | 0.2% |
|
| Other values (1023) | 12441 | 49.8% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 11754 | 47.0% |
|
| 0.01 | 62 | 0.2% |
|
| 0.02 | 40 | 0.2% |
|
| 0.03 | 38 | 0.2% |
|
| 0.04 | 34 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 97.53 | 4 | 0.0% |
|
| 98.59 | 3 | 0.0% |
|
| 98.8 | 2 | 0.0% |
|
| 99.02 | 1 | 0.0% |
|
| 100.0 | 27 | 0.1% |
|
| portfolio_communication_allocation | portfolio_financial_services | portfolio_industrials_allocation | portfolio_tech_allocation | portfolio_materials_basic_allocation | portfolio_energy_allocation | portfolio_consumer_defence_allocation | portfolio_healthcare_allocation | portfolio_property_allocation | id | portfolio_utils_allocation | portfolio_cyclical_consumer_allocation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 67922 | 0.00 | 0.00 |
| 1 | 0.78 | 9.77 | 9.97 | 35.51 | 2.86 | 0.38 | 5.88 | 14.41 | 2.67 | 134783 | 0.39 | 17.38 |
| 2 | 4.70 | 16.40 | 11.45 | 25.09 | 8.36 | 0.00 | 9.42 | 16.47 | 1.03 | 61271 | 0.00 | 7.09 |
| 3 | 6.53 | 13.80 | 10.91 | 0.16 | 2.22 | 6.79 | 25.73 | 9.00 | 0.00 | 64412 | 19.42 | 5.43 |
| 4 | 3.49 | 13.95 | 10.51 | 19.26 | 3.75 | 5.11 | 7.29 | 12.22 | 10.41 | 184058 | 3.07 | 10.95 |
#fund_config comprises of 4 columns which comprise the metadata of the mutual funds
fund_config = pd.read_csv('Hackathon_Files/external/fund_config.csv')
#pandas_profiling.ProfileReport(fund_config)
print(fund_config.info())
fund_config.describe().transpose()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 25000 entries, 0 to 24999 Data columns (total 4 columns): category 25000 non-null object parent_company 25000 non-null object fund_id 25000 non-null object fund_name 25000 non-null object dtypes: object(4) memory usage: 781.3+ KB None
| count | unique | top | freq | |
|---|---|---|---|---|
| category | 25000 | 111 | Large Growth | 1335 |
| parent_company | 25000 | 761 | Fidelity Investments | 966 |
| fund_id | 25000 | 25000 | e7dff334-3313-4348-917a-64c631da08f1 | 1 |
| fund_name | 25000 | 24958 | Calamos Investment Trust - Calamos Focus Growt... | 4 |
#fund_ratios consists of 8 columns which provides information on various fundamental ratios that describe the mutual funds
fund_ratios = pd.read_csv('Hackathon_Files/external/fund_ratios.csv')
pandas_profiling.ProfileReport(fund_ratios)
Dataset info
| Number of variables | 8 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 0.3% |
| Total size in memory | 1.5 MiB |
| Average record size in memory | 64.0 B |
Variables types
| Numeric | 3 |
|---|---|
| Categorical | 4 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 1 |
| Rejected | 0 |
| Unsupported | 0 |
Warnings
mmc has a high cardinality: 5689 distinct values Warningpb_ratio has 6059 / 24.2% zeros Zerospb_ratio is highly skewed (γ1 = 30.129) Skewedpc_ratio has a high cardinality: 1584 distinct values Warningpe_ratio has a high cardinality: 1782 distinct values Warningps_ratio has a high cardinality: 556 distinct values Warningfund_id
Categorical, Unique
| First 3 values |
|---|
| e7dff334-3313-4348-917a-64c631da08f1 |
| abf7f06e-6d96-4016-a9c8-2c7975ecf778 |
| 0edb76db-aca6-4b0f-8e4e-772674e188fa |
| Last 3 values |
|---|
| 5c653690-cbea-4370-908e-582b0c74cc2d |
| c97e052e-0f2d-42bb-bacd-f58e116d4c85 |
| 819f40d9-f07d-480d-9be8-045999bbb7f5 |
First 10 values
| Value | Count | Frequency (%) | |
| 0002e898-709a-4b80-8f5c-ec846feff26c | 1 | 0.0% |
|
| 00070160-01a2-4ad3-9290-958a110c8e9f | 1 | 0.0% |
|
| 0009d9da-6735-46c1-81cd-dbc62c53c2e2 | 1 | 0.0% |
|
| 000ad9cc-3f7e-48f3-a1f1-4f5c03d3eb6d | 1 | 0.0% |
|
| 000b6091-3c16-41a1-9df4-fce73767dd21 | 1 | 0.0% |
|
Last 10 values
| Value | Count | Frequency (%) | |
| fff6de73-cbbd-4814-a59a-f0210d669eae | 1 | 0.0% |
|
| fff75f2a-1419-4d65-a68f-89d601d47350 | 1 | 0.0% |
|
| fff79179-2ca5-4f26-a023-929c255aeda4 | 1 | 0.0% |
|
| fffb0e0f-2dc9-4e86-b534-476f9669720b | 1 | 0.0% |
|
| fffe9b65-2288-4d99-844e-89e7747aa323 | 1 | 0.0% |
|
fund_ratio_net_annual_expense
Numeric
| Distinct count | 420 |
|---|---|
| Unique (%) | 1.7% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.1217 |
|---|---|
| Minimum | 0 |
| Maximum | 15.17 |
| Zeros (%) | 0.4% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.35 |
| Q1 | 0.72 |
| Median | 1.01 |
| Q3 | 1.44 |
| 95-th percentile | 2.15 |
| Maximum | 15.17 |
| Range | 15.17 |
| Interquartile range | 0.72 |
Descriptive statistics
| Standard deviation | 0.60922 |
|---|---|
| Coef of variation | 0.54313 |
| Kurtosis | 21.129 |
| Mean | 1.1217 |
| MAD | 0.45287 |
| Skewness | 2.0915 |
| Sum | 28042 |
| Variance | 0.37114 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.0 | 339 | 1.4% |
|
| 0.95 | 334 | 1.3% |
|
| 0.75 | 322 | 1.3% |
|
| 0.9 | 311 | 1.2% |
|
| 0.65 | 301 | 1.2% |
|
| 0.8 | 295 | 1.2% |
|
| 0.85 | 276 | 1.1% |
|
| 1.15 | 265 | 1.1% |
|
| 0.99 | 263 | 1.1% |
|
| 1.25 | 256 | 1.0% |
|
| Other values (410) | 22038 | 88.2% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 89 | 0.4% |
|
| 0.01 | 35 | 0.1% |
|
| 0.02 | 12 | 0.0% |
|
| 0.03 | 34 | 0.1% |
|
| 0.04 | 26 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 8.26 | 1 | 0.0% |
|
| 8.95 | 1 | 0.0% |
|
| 10.39 | 1 | 0.0% |
|
| 10.64 | 1 | 0.0% |
|
| 15.17 | 1 | 0.0% |
|
mmc
Categorical
| Distinct count | 5689 |
|---|---|
| Unique (%) | 22.8% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
6008
|
|---|---|
| 828.01 |
|
| 2,193.13 |
|
| Other values (5685) |
18762
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 6008 | 24.0% |
|
| 828.01 | 75 | 0.3% |
|
| 2,193.13 | 41 | 0.2% |
|
| 9,234.14 | 34 | 0.1% |
|
| 88,146.69 | 17 | 0.1% |
|
| 95,232.43 | 17 | 0.1% |
|
| 1,063.09 | 17 | 0.1% |
|
| 43,954.74 | 17 | 0.1% |
|
| 39,247.34 | 17 | 0.1% |
|
| 23,042.48 | 17 | 0.1% |
|
| Other values (5678) | 18626 | 74.5% |
|
| (Missing) | 114 | 0.5% |
|
pb_ratio
Numeric
| Distinct count | 604 |
|---|---|
| Unique (%) | 2.4% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.8543 |
|---|---|
| Minimum | 0 |
| Maximum | 123.3 |
| Zeros (%) | 24.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.56 |
| Median | 1.85 |
| Q3 | 2.38 |
| 95-th percentile | 4.5 |
| Maximum | 123.3 |
| Range | 123.3 |
| Interquartile range | 1.82 |
Descriptive statistics
| Standard deviation | 2.9842 |
|---|---|
| Coef of variation | 1.6094 |
| Kurtosis | 1211.6 |
| Mean | 1.8543 |
| MAD | 1.1158 |
| Skewness | 30.129 |
| Sum | 46145 |
| Variance | 8.9057 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 6059 | 24.2% |
|
| 2.0 | 235 | 0.9% |
|
| 2.01 | 218 | 0.9% |
|
| 1.94 | 181 | 0.7% |
|
| 2.13 | 180 | 0.7% |
|
| 1.96 | 173 | 0.7% |
|
| 2.03 | 172 | 0.7% |
|
| 1.92 | 170 | 0.7% |
|
| 2.02 | 167 | 0.7% |
|
| 1.99 | 158 | 0.6% |
|
| Other values (593) | 17173 | 68.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 6059 | 24.2% |
|
| 0.12 | 2 | 0.0% |
|
| 0.26 | 7 | 0.0% |
|
| 0.27 | 6 | 0.0% |
|
| 0.29 | 5 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 10.77 | 1 | 0.0% |
|
| 11.17 | 2 | 0.0% |
|
| 14.07 | 4 | 0.0% |
|
| 22.47 | 17 | 0.1% |
|
| 123.3 | 11 | 0.0% |
|
pc_ratio
Categorical
| Distinct count | 1584 |
|---|---|
| Unique (%) | 6.3% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
4144
|
|---|---|
| 0.0 |
|
| 6.99 |
|
| Other values (1580) |
18744
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 4144 | 16.6% |
|
| 0.0 | 1900 | 7.6% |
|
| 6.99 | 98 | 0.4% |
|
| 7.18 | 98 | 0.4% |
|
| 0.46 | 92 | 0.4% |
|
| 7.21 | 81 | 0.3% |
|
| 7.63 | 78 | 0.3% |
|
| 7.54 | 77 | 0.3% |
|
| 7.23 | 76 | 0.3% |
|
| 7.96 | 76 | 0.3% |
|
| Other values (1573) | 18166 | 72.7% |
|
| (Missing) | 114 | 0.5% |
|
pe_ratio
Categorical
| Distinct count | 1782 |
|---|---|
| Unique (%) | 7.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
4128
|
|---|---|
| 0.0 |
|
| 3.65 |
|
| Other values (1778) |
18756
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 4128 | 16.5% |
|
| 0.0 | 1910 | 7.6% |
|
| 3.65 | 92 | 0.4% |
|
| 15.14 | 89 | 0.4% |
|
| 15.37 | 87 | 0.3% |
|
| 15.87 | 86 | 0.3% |
|
| 17.05 | 69 | 0.3% |
|
| 16.15 | 67 | 0.3% |
|
| 16.57 | 66 | 0.3% |
|
| 15.09 | 66 | 0.3% |
|
| Other values (1771) | 18226 | 72.9% |
|
| (Missing) | 114 | 0.5% |
|
ps_ratio
Categorical
| Distinct count | 556 |
|---|---|
| Unique (%) | 2.2% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0.0 |
3959
|
|---|---|
| 0 |
|
| 1.49 |
|
| Other values (552) |
18628
|
| Value | Count | Frequency (%) | |
| 0.0 | 3959 | 15.8% |
|
| 0 | 2026 | 8.1% |
|
| 1.49 | 273 | 1.1% |
|
| 1.47 | 252 | 1.0% |
|
| 1.51 | 249 | 1.0% |
|
| 1.45 | 238 | 1.0% |
|
| 1.5 | 221 | 0.9% |
|
| 0.99 | 193 | 0.8% |
|
| 1.31 | 183 | 0.7% |
|
| 1.54 | 180 | 0.7% |
|
| Other values (545) | 17112 | 68.4% |
|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
| fund_id | tag | fund_ratio_net_annual_expense | pb_ratio | ps_ratio | mmc | pc_ratio | pe_ratio | |
|---|---|---|---|---|---|---|---|---|
| 0 | 264614c6-5ac3-4146-ba26-1674b136cb40 | 67922 | 1.44 | 1.71 | 1.31 | 19,857.41 | 5.91 | 14.51 |
| 1 | f5ad58c2-fdea-4087-8678-e04744f89f90 | 134783 | 0.58 | 5.30 | 3.38 | 72,347.03 | 15.95 | 18.88 |
| 2 | 3c13f4ab-02c4-4ca7-a133-7e996ec5d0c4 | 61271 | 0.99 | 5.40 | 3.67 | 68,857.43 | 15.97 | 23.27 |
| 3 | ff78bdd8-59eb-4cef-9f3c-b1baacce9554 | 64412 | 0.52 | 2.23 | 1.63 | 43,266.62 | 8.93 | 12.7 |
| 4 | 63d8406d-c525-494a-8e03-d4fc4cfcb571 | 184058 | 0.75 | 2.02 | 1.4 | 43,747.9 | 7.59 | 14.74 |
#fund_specs contains 9 columns which give information about the specifications of the mutual funds
fund_specs = pd.read_csv('Hackathon_Files/external/fund_specs.csv')
pandas_profiling.ProfileReport(fund_specs)
Dataset info
| Number of variables | 9 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 3.7% |
| Total size in memory | 1.7 MiB |
| Average record size in memory | 72.0 B |
Variables types
| Numeric | 5 |
|---|---|
| Categorical | 3 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 1 |
| Unsupported | 0 |
Warnings
currency has constant value USD Rejectedfund_size has 1480 / 5.9% missing values Missinggreatstone_rating has 1365 / 5.5% zeros Zerosgreatstone_rating has 5000 / 20.0% missing values Missinginception_date has a high cardinality: 4383 distinct values Warninginvestment_class has 1480 / 5.9% missing values Missingtotal_assets is highly skewed (γ1 = 21.584) Skewedyield has 4134 / 16.5% zeros Zeroscurrency
Constant
This variable is constant and should be ignored for analysis
| Constant value | USD |
|---|
fund_size
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 5.9% |
| Missing (n) | 1480 |
| Large |
14173
|
|---|---|
| Medium |
6009
|
| Small |
3338
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| Large | 14173 | 56.7% |
|
| Medium | 6009 | 24.0% |
|
| Small | 3338 | 13.4% |
|
| (Missing) | 1480 | 5.9% |
|
greatstone_rating
Numeric
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 20.0% |
| Missing (n) | 5000 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.8397 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros (%) | 5.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| Median | 3 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range | 2 |
Descriptive statistics
| Standard deviation | 1.2774 |
|---|---|
| Coef of variation | 0.44984 |
| Kurtosis | -0.17408 |
| Mean | 2.8397 |
| MAD | 0.99599 |
| Skewness | -0.448 |
| Sum | 56795 |
| Variance | 1.6319 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 5.0 | 1629 | 6.5% |
|
| 1.0 | 1376 | 5.5% |
|
| 0.0 | 1365 | 5.5% |
|
| (Missing) | 5000 | 20.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 1365 | 5.5% |
|
| 1.0 | 1376 | 5.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 1.0 | 1376 | 5.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
| 5.0 | 1629 | 6.5% |
|
inception_date
Categorical
| Distinct count | 4383 |
|---|---|
| Unique (%) | 17.5% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 2015-06-29 |
|
|---|---|
| 2017-12-28 |
|
| 2014-03-31 |
|
| Other values (4380) |
24657
|
| Value | Count | Frequency (%) | |
| 2015-06-29 | 118 | 0.5% |
|
| 2017-12-28 | 115 | 0.5% |
|
| 2014-03-31 | 110 | 0.4% |
|
| 2007-09-27 | 104 | 0.4% |
|
| 2001-02-28 | 102 | 0.4% |
|
| 2012-11-07 | 102 | 0.4% |
|
| 2014-12-30 | 97 | 0.4% |
|
| 2015-11-29 | 95 | 0.4% |
|
| 2009-07-05 | 95 | 0.4% |
|
| 2005-03-31 | 92 | 0.4% |
|
| Other values (4373) | 23970 | 95.9% |
|
investment_class
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 5.9% |
| Missing (n) | 1480 |
| Blend |
10298
|
|---|---|
| Growth |
6671
|
| Value |
6551
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| Blend | 10298 | 41.2% |
|
| Growth | 6671 | 26.7% |
|
| Value | 6551 | 26.2% |
|
| (Missing) | 1480 | 5.9% |
|
return_ytd
Numeric
| Distinct count | 2752 |
|---|---|
| Unique (%) | 11.0% |
| Missing (%) | 0.4% |
| Missing (n) | 108 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.2889 |
|---|---|
| Minimum | -36.3 |
| Maximum | 46.29 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -36.3 |
|---|---|
| 5-th percentile | 1.37 |
| Q1 | 4.43 |
| Median | 9.82 |
| Q3 | 13.08 |
| 95-th percentile | 18.32 |
| Maximum | 46.29 |
| Range | 82.59 |
| Interquartile range | 8.65 |
Descriptive statistics
| Standard deviation | 5.801 |
|---|---|
| Coef of variation | 0.62451 |
| Kurtosis | 2.2438 |
| Mean | 9.2889 |
| MAD | 4.6308 |
| Skewness | -0.11919 |
| Sum | 231220 |
| Variance | 33.652 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 2.45 | 36 | 0.1% |
|
| 11.88 | 36 | 0.1% |
|
| 11.33 | 34 | 0.1% |
|
| 10.94 | 34 | 0.1% |
|
| 11.76 | 33 | 0.1% |
|
| 2.76 | 32 | 0.1% |
|
| 2.62 | 32 | 0.1% |
|
| 10.27 | 31 | 0.1% |
|
| 3.4 | 31 | 0.1% |
|
| 11.21 | 31 | 0.1% |
|
| Other values (2741) | 24562 | 98.2% |
|
| (Missing) | 108 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -36.3 | 1 | 0.0% |
|
| -36.14 | 1 | 0.0% |
|
| -27.8 | 1 | 0.0% |
|
| -27.79 | 1 | 0.0% |
|
| -27.7 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 38.96 | 1 | 0.0% |
|
| 41.33 | 1 | 0.0% |
|
| 45.78 | 1 | 0.0% |
|
| 45.88 | 1 | 0.0% |
|
| 46.29 | 1 | 0.0% |
|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
total_assets
Numeric
| Distinct count | 6014 |
|---|---|
| Unique (%) | 24.1% |
| Missing (%) | 0.5% |
| Missing (n) | 119 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 3476500000 |
|---|---|
| Minimum | 19160 |
| Maximum | 772720000000 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 19160 |
|---|---|
| 5-th percentile | 10240000 |
| Q1 | 93030000 |
| Median | 441790000 |
| Q3 | 1620000000 |
| 95-th percentile | 11840000000 |
| Maximum | 772720000000 |
| Range | 772720000000 |
| Interquartile range | 1527000000 |
Descriptive statistics
| Standard deviation | 18275000000 |
|---|---|
| Coef of variation | 5.2568 |
| Kurtosis | 732.19 |
| Mean | 3476500000 |
| MAD | 4840100000 |
| Skewness | 21.584 |
| Sum | 86498000000000 |
| Variance | 3.3398e+2 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1480000000.0 | 65 | 0.3% |
|
| 1030000000.0 | 63 | 0.3% |
|
| 1100000000.0 | 62 | 0.2% |
|
| 1230000000.0 | 58 | 0.2% |
|
| 1370000000.0 | 58 | 0.2% |
|
| 1160000000.0 | 56 | 0.2% |
|
| 1620000000.0 | 56 | 0.2% |
|
| 1260000000.0 | 55 | 0.2% |
|
| 1020000000.0 | 52 | 0.2% |
|
| 1080000000.0 | 52 | 0.2% |
|
| Other values (6003) | 24304 | 97.2% |
|
| (Missing) | 119 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 19160.0 | 1 | 0.0% |
|
| 24840.0 | 1 | 0.0% |
|
| 51310.0 | 1 | 0.0% |
|
| 73820.0 | 1 | 0.0% |
|
| 77330.0 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 215930000000.0 | 5 | 0.0% |
|
| 224720000000.0 | 2 | 0.0% |
|
| 369860000000.0 | 5 | 0.0% |
|
| 459650000000.0 | 3 | 0.0% |
|
| 772720000000.0 | 5 | 0.0% |
|
yield
Numeric
| Distinct count | 891 |
|---|---|
| Unique (%) | 3.6% |
| Missing (%) | 0.5% |
| Missing (n) | 127 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.8504 |
|---|---|
| Minimum | 0 |
| Maximum | 45.36 |
| Zeros (%) | 16.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.48 |
| Median | 1.65 |
| Q3 | 2.64 |
| 95-th percentile | 4.98 |
| Maximum | 45.36 |
| Range | 45.36 |
| Interquartile range | 2.16 |
Descriptive statistics
| Standard deviation | 1.8043 |
|---|---|
| Coef of variation | 0.97507 |
| Kurtosis | 45.583 |
| Mean | 1.8504 |
| MAD | 1.271 |
| Skewness | 3.6622 |
| Sum | 46026 |
| Variance | 3.2555 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 4134 | 16.5% |
|
| 2.05 | 88 | 0.4% |
|
| 1.96 | 83 | 0.3% |
|
| 1.95 | 77 | 0.3% |
|
| 2.15 | 76 | 0.3% |
|
| 2.35 | 76 | 0.3% |
|
| 1.87 | 76 | 0.3% |
|
| 2.06 | 76 | 0.3% |
|
| 1.3 | 75 | 0.3% |
|
| 2.03 | 74 | 0.3% |
|
| Other values (880) | 20038 | 80.2% |
|
| (Missing) | 127 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 4134 | 16.5% |
|
| 0.0068 | 1 | 0.0% |
|
| 0.01 | 44 | 0.2% |
|
| 0.02 | 61 | 0.2% |
|
| 0.03 | 44 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 29.94 | 1 | 0.0% |
|
| 30.16 | 1 | 0.0% |
|
| 38.75 | 1 | 0.0% |
|
| 38.77 | 1 | 0.0% |
|
| 45.36 | 1 | 0.0% |
|
| investment_class | currency | total_assets | yield | greatstone_rating | inception_date | tag | fund_size | return_ytd | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Value | USD | 1.185000e+07 | 5.57 | NaN | 2015-02-02 | 67922 | Large | 20.19 |
| 1 | Growth | USD | 1.397000e+10 | 0.42 | 3.0 | 2012-05-30 | 134783 | Large | 16.79 |
| 2 | Growth | USD | 2.660000e+09 | 0.02 | 4.0 | 1987-08-23 | 61271 | Large | 17.13 |
| 3 | Value | USD | 1.957000e+10 | 2.71 | 3.0 | 2005-10-24 | 64412 | Large | 11.63 |
| 4 | Blend | USD | 2.847000e+07 | 2.44 | 0.0 | 2016-12-12 | 184058 | Large | 10.25 |
#other_specs contains 43 columns which give information of the other aspects of the mutual funds
other_specs = pd.read_csv('Hackathon_Files/external/other_specs.csv')
pandas_profiling.ProfileReport(other_specs)
Dataset info
| Number of variables | 43 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 10.9% |
| Total size in memory | 8.2 MiB |
| Average record size in memory | 344.0 B |
Variables types
| Numeric | 35 |
|---|---|
| Categorical | 4 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 4 |
| Unsupported | 0 |
Warnings
2010_return_category has 11538 / 46.2% missing values Missing2010_return_fund has 12262 / 49.0% missing values Missing2011_return_category has 10533 / 42.1% missing values Missing2011_return_fund has 11163 / 44.7% missing values Missing2012_fund_return has 9985 / 39.9% missing values Missing2012_return_category has 9124 / 36.5% missing values Missing2013_category_return is highly correlated with 2013_return_fund (ρ = 0.9414) Rejected2013_return_fund has 8538 / 34.2% missing values Missing2014_category_return has 6183 / 24.7% missing values Missing2014_return_fund has 7206 / 28.8% missing values Missing2015_return_fund has 5688 / 22.8% missing values Missing2016_return_category has 3097 / 12.4% missing values Missing2016_return_fund has 3931 / 15.7% missing values Missing2017_category_return has 1428 / 5.7% missing values Missing2017_return_fund is highly correlated with 2017_category_return (ρ = 0.91277) Rejected2018_return_category has 809 / 3.2% missing values Missing2018_return_fund has 940 / 3.8% missing values Missing3_months_return_category is highly correlated with ytd_return_category (ρ = 1) Rejectedbond_percentage_of_porfolio has 11583 / 46.3% zeros Zeroscash_percent_of_portfolio has 1253 / 5.0% zeros Zeroscategory_return_2015 has 4601 / 18.4% missing values Missingfund_return_3months is highly correlated with ytd_return_fund (ρ = 0.97222) Rejectedfund_return_3years has 1540 / 6.2% zeros Zerosgreatstone_rating has 1365 / 5.5% zeros Zerosgreatstone_rating has 5000 / 20.0% missing values Missingmmc has a high cardinality: 5689 distinct values Warningpb_ratio has 6059 / 24.2% zeros Zerospb_ratio is highly skewed (γ1 = 30.129) Skewedpc_ratio has a high cardinality: 1584 distinct values Warningpe_ratio has a high cardinality: 1782 distinct values Warningportfolio_convertable has 17032 / 68.1% zeros Zerosportfolio_others has 14058 / 56.2% zeros Zerosportfolio_preferred has 18290 / 73.2% zeros Zerosps_ratio has a high cardinality: 556 distinct values Warningstock_percent_of_portfolio has 5145 / 20.6% zeros Zerosyears_down has 1641 / 6.6% missing values Missingyears_up has 1812 / 7.2% missing values Missing1_month_fund_return
Numeric
| Distinct count | 1266 |
|---|---|
| Unique (%) | 5.1% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.96168 |
|---|---|
| Minimum | -13.63 |
| Maximum | 15.29 |
| Zeros (%) | 0.5% |
Quantile statistics
| Minimum | -13.63 |
|---|---|
| 5-th percentile | -2.03 |
| Q1 | 0.35 |
| Median | 1.1 |
| Q3 | 1.72 |
| 95-th percentile | 3.35 |
| Maximum | 15.29 |
| Range | 28.92 |
| Interquartile range | 1.37 |
Descriptive statistics
| Standard deviation | 1.6943 |
|---|---|
| Coef of variation | 1.7619 |
| Kurtosis | 7.2433 |
| Mean | 0.96168 |
| MAD | 1.1087 |
| Skewness | 0.0010391 |
| Sum | 23932 |
| Variance | 2.8708 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.17 | 156 | 0.6% |
|
| 1.18 | 154 | 0.6% |
|
| 1.15 | 142 | 0.6% |
|
| 1.28 | 138 | 0.6% |
|
| 1.23 | 133 | 0.5% |
|
| 0.0 | 131 | 0.5% |
|
| 1.12 | 130 | 0.5% |
|
| 1.39 | 129 | 0.5% |
|
| 1.32 | 128 | 0.5% |
|
| 1.27 | 128 | 0.5% |
|
| Other values (1255) | 23516 | 94.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -13.63 | 1 | 0.0% |
|
| -10.34 | 1 | 0.0% |
|
| -10.22 | 1 | 0.0% |
|
| -9.53 | 1 | 0.0% |
|
| -9.48 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 14.26 | 1 | 0.0% |
|
| 15.11 | 1 | 0.0% |
|
| 15.17 | 1 | 0.0% |
|
| 15.27 | 1 | 0.0% |
|
| 15.29 | 1 | 0.0% |
|
1_year_return_fund
Numeric
| Distinct count | 3653 |
|---|---|
| Unique (%) | 14.6% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.6062 |
|---|---|
| Minimum | -37.09 |
| Maximum | 59.19 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -37.09 |
|---|---|
| 5-th percentile | -9.416 |
| Q1 | -0.06 |
| Median | 3.1 |
| Q3 | 5.12 |
| 95-th percentile | 13.43 |
| Maximum | 59.19 |
| Range | 96.28 |
| Interquartile range | 5.18 |
Descriptive statistics
| Standard deviation | 6.6941 |
|---|---|
| Coef of variation | 2.5685 |
| Kurtosis | 2.9963 |
| Mean | 2.6062 |
| MAD | 4.5776 |
| Skewness | -0.0084128 |
| Sum | 64855 |
| Variance | 44.81 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 3.45 | 49 | 0.2% |
|
| 3.34 | 49 | 0.2% |
|
| 2.32 | 47 | 0.2% |
|
| 3.46 | 47 | 0.2% |
|
| 2.92 | 46 | 0.2% |
|
| 4.35 | 46 | 0.2% |
|
| 2.75 | 46 | 0.2% |
|
| 3.39 | 45 | 0.2% |
|
| 2.83 | 45 | 0.2% |
|
| 4.27 | 45 | 0.2% |
|
| Other values (3642) | 24420 | 97.7% |
|
| (Missing) | 115 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -37.09 | 1 | 0.0% |
|
| -36.54 | 1 | 0.0% |
|
| -35.73 | 1 | 0.0% |
|
| -35.47 | 1 | 0.0% |
|
| -35.2 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 47.08 | 1 | 0.0% |
|
| 47.11 | 1 | 0.0% |
|
| 50.43 | 1 | 0.0% |
|
| 52.43 | 1 | 0.0% |
|
| 59.19 | 1 | 0.0% |
|
2010_return_category
Numeric
| Distinct count | 103 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 46.2% |
| Missing (n) | 11538 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 13.155 |
|---|---|
| Minimum | -28.95 |
| Maximum | 41.56 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -28.95 |
|---|---|
| 5-th percentile | 1.65 |
| Q1 | 8.6 |
| Median | 13.66 |
| Q3 | 15.53 |
| 95-th percentile | 26.17 |
| Maximum | 41.56 |
| Range | 70.51 |
| Interquartile range | 6.93 |
Descriptive statistics
| Standard deviation | 7.6595 |
|---|---|
| Coef of variation | 0.58224 |
| Kurtosis | 2.5943 |
| Mean | 13.155 |
| MAD | 5.5914 |
| Skewness | -0.21867 |
| Sum | 177090 |
| Variance | 58.667 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 15.53 | 850 | 3.4% |
|
| 14.01 | 844 | 3.4% |
|
| 13.66 | 701 | 2.8% |
|
| 7.72 | 599 | 2.4% |
|
| 11.83 | 457 | 1.8% |
|
| 13.74 | 412 | 1.6% |
|
| 26.98 | 409 | 1.6% |
|
| 10.24 | 394 | 1.6% |
|
| 25.61 | 387 | 1.5% |
|
| 24.61 | 371 | 1.5% |
|
| Other values (92) | 8038 | 32.2% |
|
| (Missing) | 11538 | 46.2% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -28.95 | 8 | 0.0% |
|
| -28.7 | 4 | 0.0% |
|
| -24.28 | 50 | 0.2% |
|
| -15.61 | 13 | 0.1% |
|
| -2.0 | 45 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 27.08 | 141 | 0.6% |
|
| 27.35 | 27 | 0.1% |
|
| 29.99 | 22 | 0.1% |
|
| 30.88 | 9 | 0.0% |
|
| 41.56 | 50 | 0.2% |
|
2010_return_fund
Numeric
| Distinct count | 3359 |
|---|---|
| Unique (%) | 13.4% |
| Missing (%) | 49.0% |
| Missing (n) | 12262 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 13.603 |
|---|---|
| Minimum | -51.55 |
| Maximum | 54.5 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -51.55 |
|---|---|
| 5-th percentile | 1.46 |
| Q1 | 8.21 |
| Median | 13.07 |
| Q3 | 18.15 |
| 95-th percentile | 28.501 |
| Maximum | 54.5 |
| Range | 106.05 |
| Interquartile range | 9.94 |
Descriptive statistics
| Standard deviation | 8.9666 |
|---|---|
| Coef of variation | 0.65915 |
| Kurtosis | 4.028 |
| Mean | 13.603 |
| MAD | 6.5522 |
| Skewness | -0.16529 |
| Sum | 173280 |
| Variance | 80.4 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 10.62 | 18 | 0.1% |
|
| 14.56 | 17 | 0.1% |
|
| 14.02 | 17 | 0.1% |
|
| 12.46 | 16 | 0.1% |
|
| 13.34 | 15 | 0.1% |
|
| 11.22 | 15 | 0.1% |
|
| 13.51 | 15 | 0.1% |
|
| 11.41 | 15 | 0.1% |
|
| 15.86 | 15 | 0.1% |
|
| 14.36 | 15 | 0.1% |
|
| Other values (3348) | 12580 | 50.3% |
|
| (Missing) | 12262 | 49.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -51.55 | 1 | 0.0% |
|
| -51.19 | 1 | 0.0% |
|
| -50.13 | 1 | 0.0% |
|
| -50.11 | 1 | 0.0% |
|
| -49.86 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 53.32 | 1 | 0.0% |
|
| 53.33 | 1 | 0.0% |
|
| 53.9 | 1 | 0.0% |
|
| 53.96 | 1 | 0.0% |
|
| 54.5 | 1 | 0.0% |
|
2011_return_category
Numeric
| Distinct count | 102 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 42.1% |
| Missing (n) | 10533 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -1.8647 |
|---|---|
| Minimum | -35.5 |
| Maximum | 32.9 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -35.5 |
|---|---|
| 5-th percentile | -14.72 |
| Q1 | -4.07 |
| Median | -2.06 |
| Q3 | 2.01 |
| 95-th percentile | 10.18 |
| Maximum | 32.9 |
| Range | 68.4 |
| Interquartile range | 6.08 |
Descriptive statistics
| Standard deviation | 7.192 |
|---|---|
| Coef of variation | -3.8568 |
| Kurtosis | 1.2128 |
| Mean | -1.8647 |
| MAD | 5.2441 |
| Skewness | -0.44908 |
| Sum | -26977 |
| Variance | 51.724 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -2.46 | 892 | 3.6% |
|
| -1.27 | 866 | 3.5% |
|
| -0.75 | 736 | 2.9% |
|
| -3.96 | 635 | 2.5% |
|
| 5.86 | 630 | 2.5% |
|
| -13.97 | 499 | 2.0% |
|
| -0.11 | 477 | 1.9% |
|
| -7.93 | 438 | 1.8% |
|
| -3.55 | 432 | 1.7% |
|
| -4.07 | 403 | 1.6% |
|
| Other values (91) | 8459 | 33.8% |
|
| (Missing) | 10533 | 42.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -35.5 | 13 | 0.1% |
|
| -24.95 | 41 | 0.2% |
|
| -22.64 | 11 | 0.0% |
|
| -21.45 | 13 | 0.1% |
|
| -20.95 | 41 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 10.64 | 119 | 0.5% |
|
| 10.93 | 132 | 0.5% |
|
| 11.47 | 12 | 0.0% |
|
| 11.74 | 70 | 0.3% |
|
| 32.9 | 16 | 0.1% |
|
2011_return_fund
Numeric
| Distinct count | 3475 |
|---|---|
| Unique (%) | 13.9% |
| Missing (%) | 44.7% |
| Missing (n) | 11163 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -1.3365 |
|---|---|
| Minimum | -43.78 |
| Maximum | 55.81 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -43.78 |
|---|---|
| 5-th percentile | -16.78 |
| Q1 | -5.33 |
| Median | -0.52 |
| Q3 | 4.04 |
| 95-th percentile | 10.572 |
| Maximum | 55.81 |
| Range | 99.59 |
| Interquartile range | 9.37 |
Descriptive statistics
| Standard deviation | 8.4071 |
|---|---|
| Coef of variation | -6.2903 |
| Kurtosis | 1.8999 |
| Mean | -1.3365 |
| MAD | 6.3388 |
| Skewness | -0.55184 |
| Sum | -18494 |
| Variance | 70.68 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -2.65 | 16 | 0.1% |
|
| -2.28 | 16 | 0.1% |
|
| 1.13 | 16 | 0.1% |
|
| -2.59 | 16 | 0.1% |
|
| 2.03 | 15 | 0.1% |
|
| 1.16 | 15 | 0.1% |
|
| 1.73 | 15 | 0.1% |
|
| 1.32 | 15 | 0.1% |
|
| -0.79 | 15 | 0.1% |
|
| 1.12 | 15 | 0.1% |
|
| Other values (3464) | 13683 | 54.7% |
|
| (Missing) | 11163 | 44.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -43.78 | 1 | 0.0% |
|
| -43.25 | 1 | 0.0% |
|
| -42.71 | 1 | 0.0% |
|
| -42.53 | 1 | 0.0% |
|
| -41.56 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 43.21 | 1 | 0.0% |
|
| 44.49 | 1 | 0.0% |
|
| 55.65 | 1 | 0.0% |
|
| 55.79 | 1 | 0.0% |
|
| 55.81 | 1 | 0.0% |
|
2012_fund_return
Numeric
| Distinct count | 3026 |
|---|---|
| Unique (%) | 12.1% |
| Missing (%) | 39.9% |
| Missing (n) | 9985 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 12.898 |
|---|---|
| Minimum | -43.9 |
| Maximum | 81.66 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -43.9 |
|---|---|
| 5-th percentile | 2.347 |
| Q1 | 8.89 |
| Median | 13.49 |
| Q3 | 16.84 |
| 95-th percentile | 22.839 |
| Maximum | 81.66 |
| Range | 125.56 |
| Interquartile range | 7.95 |
Descriptive statistics
| Standard deviation | 7.1249 |
|---|---|
| Coef of variation | 0.5524 |
| Kurtosis | 7.1155 |
| Mean | 12.898 |
| MAD | 5.1394 |
| Skewness | -0.49862 |
| Sum | 193670 |
| Variance | 50.765 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 15.63 | 24 | 0.1% |
|
| 13.33 | 22 | 0.1% |
|
| 15.71 | 21 | 0.1% |
|
| 16.35 | 21 | 0.1% |
|
| 14.71 | 20 | 0.1% |
|
| 13.98 | 20 | 0.1% |
|
| 15.25 | 19 | 0.1% |
|
| 12.38 | 19 | 0.1% |
|
| 16.54 | 18 | 0.1% |
|
| 16.05 | 18 | 0.1% |
|
| Other values (3015) | 14813 | 59.3% |
|
| (Missing) | 9985 | 39.9% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -43.9 | 1 | 0.0% |
|
| -43.36 | 1 | 0.0% |
|
| -37.78 | 1 | 0.0% |
|
| -37.19 | 1 | 0.0% |
|
| -35.64 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 54.75 | 1 | 0.0% |
|
| 64.18 | 1 | 0.0% |
|
| 65.82 | 1 | 0.0% |
|
| 79.82 | 1 | 0.0% |
|
| 81.66 | 1 | 0.0% |
|
2012_return_category
Numeric
| Distinct count | 103 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 36.5% |
| Missing (n) | 9124 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 12.411 |
|---|---|
| Minimum | -23.7 |
| Maximum | 31.78 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -23.7 |
|---|---|
| 5-th percentile | 2.8 |
| Q1 | 9.01 |
| Median | 14.57 |
| Q3 | 15.46 |
| 95-th percentile | 18.29 |
| Maximum | 31.78 |
| Range | 55.48 |
| Interquartile range | 6.45 |
Descriptive statistics
| Standard deviation | 5.8154 |
|---|---|
| Coef of variation | 0.46857 |
| Kurtosis | 5.7132 |
| Mean | 12.411 |
| MAD | 4.3754 |
| Skewness | -1.1426 |
| Sum | 197040 |
| Variance | 33.819 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 15.34 | 957 | 3.8% |
|
| 14.96 | 922 | 3.7% |
|
| 14.57 | 793 | 3.2% |
|
| 7.01 | 672 | 2.7% |
|
| 11.72 | 535 | 2.1% |
|
| 15.84 | 520 | 2.1% |
|
| 18.29 | 459 | 1.8% |
|
| 13.15 | 459 | 1.8% |
|
| 14.67 | 431 | 1.7% |
|
| 15.46 | 428 | 1.7% |
|
| Other values (92) | 9700 | 38.8% |
|
| (Missing) | 9124 | 36.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -23.7 | 50 | 0.2% |
|
| -19.55 | 8 | 0.0% |
|
| -10.52 | 13 | 0.1% |
|
| -9.2 | 50 | 0.2% |
|
| -7.39 | 51 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 22.64 | 27 | 0.1% |
|
| 23.62 | 44 | 0.2% |
|
| 24.77 | 62 | 0.2% |
|
| 29.69 | 13 | 0.1% |
|
| 31.78 | 129 | 0.5% |
|
2013_category_return
Highly correlated
This variable is highly correlated with 2013_return_fund and should be ignored for analysis
| Correlation | 0.9414 |
|---|
2013_return_fund
Numeric
| Distinct count | 5513 |
|---|---|
| Unique (%) | 22.1% |
| Missing (%) | 34.2% |
| Missing (n) | 8538 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 17.149 |
|---|---|
| Minimum | -67.62 |
| Maximum | 116.38 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -67.62 |
|---|---|
| 5-th percentile | -5.7395 |
| Q1 | 0.84 |
| Median | 18.59 |
| Q3 | 31.69 |
| 95-th percentile | 41.399 |
| Maximum | 116.38 |
| Range | 184 |
| Interquartile range | 30.85 |
Descriptive statistics
| Standard deviation | 17.117 |
|---|---|
| Coef of variation | 0.99814 |
| Kurtosis | 0.23209 |
| Mean | 17.149 |
| MAD | 14.6 |
| Skewness | -0.12006 |
| Sum | 282300 |
| Variance | 292.99 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 14 | 0.1% |
|
| 0.46 | 14 | 0.1% |
|
| -1.78 | 14 | 0.1% |
|
| -2.5 | 13 | 0.1% |
|
| -2.2 | 13 | 0.1% |
|
| -0.17 | 12 | 0.0% |
|
| -2.26 | 12 | 0.0% |
|
| -1.94 | 12 | 0.0% |
|
| -2.03 | 12 | 0.0% |
|
| 19.44 | 12 | 0.0% |
|
| Other values (5502) | 16334 | 65.3% |
|
| (Missing) | 8538 | 34.2% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -67.62 | 1 | 0.0% |
|
| -67.28 | 1 | 0.0% |
|
| -54.0 | 1 | 0.0% |
|
| -53.55 | 1 | 0.0% |
|
| -53.39 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 93.71 | 1 | 0.0% |
|
| 108.7 | 1 | 0.0% |
|
| 110.85 | 1 | 0.0% |
|
| 114.22 | 1 | 0.0% |
|
| 116.38 | 1 | 0.0% |
|
2014_category_return
Numeric
| Distinct count | 109 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 24.7% |
| Missing (n) | 6183 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4.677 |
|---|---|
| Minimum | -17.98 |
| Maximum | 44.59 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -17.98 |
|---|---|
| 5-th percentile | -4.98 |
| Q1 | 1.54 |
| Median | 5.04 |
| Q3 | 9.31 |
| 95-th percentile | 10.96 |
| Maximum | 44.59 |
| Range | 62.57 |
| Interquartile range | 7.77 |
Descriptive statistics
| Standard deviation | 6.2251 |
|---|---|
| Coef of variation | 1.331 |
| Kurtosis | 4.3396 |
| Mean | 4.677 |
| MAD | 4.3946 |
| Skewness | 0.18587 |
| Sum | 88008 |
| Variance | 38.752 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 10.0 | 1078 | 4.3% |
|
| 10.96 | 1024 | 4.1% |
|
| 10.21 | 891 | 3.6% |
|
| 5.18 | 736 | 2.9% |
|
| 2.79 | 622 | 2.5% |
|
| 6.21 | 577 | 2.3% |
|
| -3.01 | 547 | 2.2% |
|
| 1.11 | 516 | 2.1% |
|
| 2.44 | 514 | 2.1% |
|
| -4.98 | 504 | 2.0% |
|
| Other values (98) | 11808 | 47.2% |
|
| (Missing) | 6183 | 24.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -17.98 | 82 | 0.3% |
|
| -17.48 | 52 | 0.2% |
|
| -17.23 | 14 | 0.1% |
|
| -16.65 | 58 | 0.2% |
|
| -14.46 | 15 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 21.7 | 17 | 0.1% |
|
| 27.25 | 93 | 0.4% |
|
| 28.03 | 184 | 0.7% |
|
| 33.36 | 7 | 0.0% |
|
| 44.59 | 14 | 0.1% |
|
2014_return_fund
Numeric
| Distinct count | 3316 |
|---|---|
| Unique (%) | 13.3% |
| Missing (%) | 28.8% |
| Missing (n) | 7206 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 5.0969 |
|---|---|
| Minimum | -42.4 |
| Maximum | 63.8 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -42.4 |
|---|---|
| 5-th percentile | -6.6835 |
| Q1 | 1.42 |
| Median | 5.2 |
| Q3 | 9.2 |
| 95-th percentile | 14.693 |
| Maximum | 63.8 |
| Range | 106.2 |
| Interquartile range | 7.78 |
Descriptive statistics
| Standard deviation | 7.4266 |
|---|---|
| Coef of variation | 1.4571 |
| Kurtosis | 4.8801 |
| Mean | 5.0969 |
| MAD | 5.1713 |
| Skewness | 0.019534 |
| Sum | 90694 |
| Variance | 55.154 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 5.58 | 29 | 0.1% |
|
| 5.88 | 26 | 0.1% |
|
| 5.54 | 26 | 0.1% |
|
| 4.12 | 26 | 0.1% |
|
| 5.99 | 25 | 0.1% |
|
| 5.66 | 24 | 0.1% |
|
| 4.86 | 24 | 0.1% |
|
| 6.01 | 23 | 0.1% |
|
| 5.81 | 23 | 0.1% |
|
| 5.53 | 23 | 0.1% |
|
| Other values (3305) | 17545 | 70.2% |
|
| (Missing) | 7206 | 28.8% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -42.4 | 1 | 0.0% |
|
| -40.84 | 1 | 0.0% |
|
| -40.55 | 1 | 0.0% |
|
| -40.43 | 1 | 0.0% |
|
| -40.42 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 48.22 | 1 | 0.0% |
|
| 52.86 | 1 | 0.0% |
|
| 54.48 | 1 | 0.0% |
|
| 63.71 | 1 | 0.0% |
|
| 63.8 | 1 | 0.0% |
|
2015_return_fund
Numeric
| Distinct count | 3070 |
|---|---|
| Unique (%) | 12.3% |
| Missing (%) | 22.8% |
| Missing (n) | 5688 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -1.9572 |
|---|---|
| Minimum | -62.11 |
| Maximum | 86.62 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | -62.11 |
|---|---|
| 5-th percentile | -13.24 |
| Q1 | -3.81 |
| Median | -1.16 |
| Q3 | 1.04 |
| 95-th percentile | 5.99 |
| Maximum | 86.62 |
| Range | 148.73 |
| Interquartile range | 4.85 |
Descriptive statistics
| Standard deviation | 6.3592 |
|---|---|
| Coef of variation | -3.249 |
| Kurtosis | 12.834 |
| Mean | -1.9572 |
| MAD | 4.0475 |
| Skewness | -1.6307 |
| Sum | -37798 |
| Variance | 40.439 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 41 | 0.2% |
|
| -1.5 | 35 | 0.1% |
|
| -0.87 | 35 | 0.1% |
|
| -1.12 | 35 | 0.1% |
|
| -0.04 | 34 | 0.1% |
|
| -1.66 | 34 | 0.1% |
|
| -1.17 | 32 | 0.1% |
|
| -1.42 | 32 | 0.1% |
|
| 0.32 | 31 | 0.1% |
|
| 0.3 | 31 | 0.1% |
|
| Other values (3059) | 18972 | 75.9% |
|
| (Missing) | 5688 | 22.8% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -62.11 | 1 | 0.0% |
|
| -61.76 | 1 | 0.0% |
|
| -56.95 | 1 | 0.0% |
|
| -56.49 | 2 | 0.0% |
|
| -56.43 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 22.61 | 1 | 0.0% |
|
| 29.51 | 1 | 0.0% |
|
| 30.81 | 1 | 0.0% |
|
| 85.18 | 1 | 0.0% |
|
| 86.62 | 1 | 0.0% |
|
2016_return_category
Numeric
| Distinct count | 102 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 12.4% |
| Missing (n) | 3097 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 7.2853 |
|---|---|
| Minimum | -21.11 |
| Maximum | 54.81 |
| Zeros (%) | 0.6% |
Quantile statistics
| Minimum | -21.11 |
|---|---|
| 5-th percentile | -0.25 |
| Q1 | 3.23 |
| Median | 6.23 |
| Q3 | 10.37 |
| 95-th percentile | 20.78 |
| Maximum | 54.81 |
| Range | 75.92 |
| Interquartile range | 7.14 |
Descriptive statistics
| Standard deviation | 6.7941 |
|---|---|
| Coef of variation | 0.93259 |
| Kurtosis | 6.7273 |
| Mean | 7.2853 |
| MAD | 4.7939 |
| Skewness | 1.4074 |
| Sum | 159570 |
| Variance | 46.16 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 3.23 | 2075 | 8.3% |
|
| 10.37 | 1137 | 4.5% |
|
| 14.81 | 1021 | 4.1% |
|
| 5.54 | 739 | 3.0% |
|
| 8.47 | 657 | 2.6% |
|
| 7.34 | 651 | 2.6% |
|
| 20.78 | 627 | 2.5% |
|
| 0.79 | 608 | 2.4% |
|
| 13.3 | 589 | 2.4% |
|
| 11.2 | 576 | 2.3% |
|
| Other values (91) | 13223 | 52.9% |
|
| (Missing) | 3097 | 12.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -21.11 | 52 | 0.2% |
|
| -10.6 | 98 | 0.4% |
|
| -2.98 | 118 | 0.5% |
|
| -2.75 | 97 | 0.4% |
|
| -2.14 | 366 | 1.5% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 26.69 | 88 | 0.4% |
|
| 27.3 | 90 | 0.4% |
|
| 29.22 | 70 | 0.3% |
|
| 32.05 | 12 | 0.0% |
|
| 54.81 | 53 | 0.2% |
|
2016_return_fund
Numeric
| Distinct count | 3729 |
|---|---|
| Unique (%) | 14.9% |
| Missing (%) | 15.7% |
| Missing (n) | 3931 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 7.2878 |
|---|---|
| Minimum | -62.92 |
| Maximum | 80.1 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -62.92 |
|---|---|
| 5-th percentile | -2.22 |
| Q1 | 2.07 |
| Median | 6.32 |
| Q3 | 10.85 |
| 95-th percentile | 21.656 |
| Maximum | 80.1 |
| Range | 143.02 |
| Interquartile range | 8.78 |
Descriptive statistics
| Standard deviation | 8.1711 |
|---|---|
| Coef of variation | 1.1212 |
| Kurtosis | 7.1091 |
| Mean | 7.2878 |
| MAD | 5.7766 |
| Skewness | 1.1104 |
| Sum | 153550 |
| Variance | 66.767 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.29 | 27 | 0.1% |
|
| -0.01 | 27 | 0.1% |
|
| 6.4 | 26 | 0.1% |
|
| 6.52 | 26 | 0.1% |
|
| 5.34 | 25 | 0.1% |
|
| 6.45 | 24 | 0.1% |
|
| 6.47 | 24 | 0.1% |
|
| 7.01 | 24 | 0.1% |
|
| 6.55 | 23 | 0.1% |
|
| 7.26 | 23 | 0.1% |
|
| Other values (3718) | 20820 | 83.3% |
|
| (Missing) | 3931 | 15.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -62.92 | 1 | 0.0% |
|
| -62.54 | 1 | 0.0% |
|
| -51.74 | 1 | 0.0% |
|
| -51.22 | 1 | 0.0% |
|
| -40.7 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 73.02 | 1 | 0.0% |
|
| 75.08 | 1 | 0.0% |
|
| 75.97 | 1 | 0.0% |
|
| 78.45 | 1 | 0.0% |
|
| 80.1 | 1 | 0.0% |
|
2017_category_return
Numeric
| Distinct count | 103 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 5.7% |
| Missing (n) | 1428 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 14.848 |
|---|---|
| Minimum | -27.04 |
| Maximum | 46.78 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -27.04 |
|---|---|
| 5-th percentile | 1.73 |
| Q1 | 6.25 |
| Median | 14.67 |
| Q3 | 21.5 |
| 95-th percentile | 31.58 |
| Maximum | 46.78 |
| Range | 73.82 |
| Interquartile range | 15.25 |
Descriptive statistics
| Standard deviation | 9.6485 |
|---|---|
| Coef of variation | 0.64982 |
| Kurtosis | -0.023909 |
| Mean | 14.848 |
| MAD | 7.9925 |
| Skewness | 0.16038 |
| Sum | 350000 |
| Variance | 93.094 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 27.67 | 1285 | 5.1% |
|
| 20.44 | 1214 | 4.9% |
|
| 15.94 | 1088 | 4.4% |
|
| 3.71 | 915 | 3.7% |
|
| 23.61 | 807 | 3.2% |
|
| 34.17 | 719 | 2.9% |
|
| 13.21 | 676 | 2.7% |
|
| 12.28 | 672 | 2.7% |
|
| 25.12 | 641 | 2.6% |
|
| 6.47 | 635 | 2.5% |
|
| Other values (92) | 14920 | 59.7% |
|
| (Missing) | 1428 | 5.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -27.04 | 53 | 0.2% |
|
| -5.78 | 99 | 0.4% |
|
| -4.84 | 71 | 0.3% |
|
| 0.56 | 92 | 0.4% |
|
| 0.77 | 26 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 35.35 | 152 | 0.6% |
|
| 36.19 | 131 | 0.5% |
|
| 37.39 | 71 | 0.3% |
|
| 42.4 | 51 | 0.2% |
|
| 46.78 | 15 | 0.1% |
|
2017_return_fund
Highly correlated
This variable is highly correlated with 2017_category_return and should be ignored for analysis
| Correlation | 0.91277 |
|---|
2018_return_category
Numeric
| Distinct count | 100 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 3.2% |
| Missing (n) | 809 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -6.4862 |
|---|---|
| Minimum | -27.27 |
| Maximum | 7.19 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -27.27 |
|---|---|
| 5-th percentile | -16.07 |
| Q1 | -9.27 |
| Median | -6.25 |
| Q3 | -2.09 |
| 95-th percentile | 0.92 |
| Maximum | 7.19 |
| Range | 34.46 |
| Interquartile range | 7.18 |
Descriptive statistics
| Standard deviation | 5.4202 |
|---|---|
| Coef of variation | -0.83565 |
| Kurtosis | -0.16797 |
| Mean | -6.4862 |
| MAD | 4.3311 |
| Skewness | -0.51425 |
| Sum | -156910 |
| Variance | 29.379 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -5.76 | 1330 | 5.3% |
|
| -2.09 | 1299 | 5.2% |
|
| -6.27 | 1236 | 4.9% |
|
| -8.53 | 1097 | 4.4% |
|
| -0.5 | 943 | 3.8% |
|
| -9.64 | 837 | 3.3% |
|
| -16.07 | 739 | 3.0% |
|
| -12.72 | 676 | 2.7% |
|
| -14.59 | 667 | 2.7% |
|
| -2.59 | 648 | 2.6% |
|
| Other values (89) | 14719 | 58.9% |
|
| (Missing) | 809 | 3.2% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -27.27 | 71 | 0.3% |
|
| -20.68 | 52 | 0.2% |
|
| -19.13 | 148 | 0.6% |
|
| -19.01 | 92 | 0.4% |
|
| -18.34 | 135 | 0.5% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 1.77 | 39 | 0.2% |
|
| 1.91 | 51 | 0.2% |
|
| 2.11 | 177 | 0.7% |
|
| 2.76 | 50 | 0.2% |
|
| 7.19 | 53 | 0.2% |
|
2018_return_fund
Numeric
| Distinct count | 3132 |
|---|---|
| Unique (%) | 12.5% |
| Missing (%) | 3.8% |
| Missing (n) | 940 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -6.6868 |
|---|---|
| Minimum | -59.1 |
| Maximum | 39.47 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -59.1 |
|---|---|
| 5-th percentile | -18.71 |
| Q1 | -10.52 |
| Median | -5.795 |
| Q3 | -1.62 |
| 95-th percentile | 1.52 |
| Maximum | 39.47 |
| Range | 98.57 |
| Interquartile range | 8.9 |
Descriptive statistics
| Standard deviation | 6.6815 |
|---|---|
| Coef of variation | -0.99922 |
| Kurtosis | 1.4985 |
| Mean | -6.6868 |
| MAD | 5.2661 |
| Skewness | -0.6087 |
| Sum | -160880 |
| Variance | 44.643 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -4.95 | 28 | 0.1% |
|
| 0.31 | 28 | 0.1% |
|
| 0.53 | 28 | 0.1% |
|
| 0.14 | 28 | 0.1% |
|
| -5.17 | 27 | 0.1% |
|
| -7.65 | 27 | 0.1% |
|
| -4.7 | 27 | 0.1% |
|
| 0.64 | 27 | 0.1% |
|
| 0.63 | 27 | 0.1% |
|
| 0.85 | 26 | 0.1% |
|
| Other values (3121) | 23787 | 95.1% |
|
| (Missing) | 940 | 3.8% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -59.1 | 1 | 0.0% |
|
| -58.6 | 1 | 0.0% |
|
| -48.0 | 1 | 0.0% |
|
| -47.81 | 1 | 0.0% |
|
| -47.73 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 23.6 | 2 | 0.0% |
|
| 28.39 | 1 | 0.0% |
|
| 29.63 | 1 | 0.0% |
|
| 37.94 | 1 | 0.0% |
|
| 39.47 | 1 | 0.0% |
|
3_months_return_category
Highly correlated
This variable is highly correlated with ytd_return_category and should be ignored for analysis
| Correlation | 1 |
|---|
bond_percentage_of_porfolio
Numeric
| Distinct count | 2767 |
|---|---|
| Unique (%) | 11.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 30.782 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 46.3% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 2.17 |
| Q3 | 64.06 |
| 95-th percentile | 98.44 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 64.06 |
Descriptive statistics
| Standard deviation | 38.687 |
|---|---|
| Coef of variation | 1.2568 |
| Kurtosis | -1.0773 |
| Mean | 30.782 |
| MAD | 34.38 |
| Skewness | 0.78545 |
| Sum | 766040 |
| Variance | 1496.7 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 11583 | 46.3% |
|
| 100.0 | 216 | 0.9% |
|
| 99.99 | 49 | 0.2% |
|
| 0.01 | 37 | 0.1% |
|
| 11.63 | 35 | 0.1% |
|
| 0.08 | 35 | 0.1% |
|
| 0.14 | 29 | 0.1% |
|
| 94.46 | 27 | 0.1% |
|
| 99.98 | 25 | 0.1% |
|
| 94.99 | 25 | 0.1% |
|
| Other values (2756) | 12825 | 51.3% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 11583 | 46.3% |
|
| 0.01 | 37 | 0.1% |
|
| 0.02 | 18 | 0.1% |
|
| 0.03 | 20 | 0.1% |
|
| 0.04 | 21 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.95 | 6 | 0.0% |
|
| 99.97 | 25 | 0.1% |
|
| 99.98 | 25 | 0.1% |
|
| 99.99 | 49 | 0.2% |
|
| 100.0 | 216 | 0.9% |
|
cash_percent_of_portfolio
Numeric
| Distinct count | 2083 |
|---|---|
| Unique (%) | 8.3% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 7.3818 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 5.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1.24 |
| Median | 3.14 |
| Q3 | 7.04 |
| 95-th percentile | 32.21 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 5.8 |
Descriptive statistics
| Standard deviation | 12.9 |
|---|---|
| Coef of variation | 1.7475 |
| Kurtosis | 18.684 |
| Mean | 7.3818 |
| MAD | 7.4007 |
| Skewness | 3.8862 |
| Sum | 183700 |
| Variance | 166.4 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 1253 | 5.0% |
|
| 0.01 | 124 | 0.5% |
|
| 1.64 | 81 | 0.3% |
|
| 1.4 | 75 | 0.3% |
|
| 1.62 | 74 | 0.3% |
|
| 0.88 | 74 | 0.3% |
|
| 100.0 | 71 | 0.3% |
|
| 0.99 | 70 | 0.3% |
|
| 3.15 | 69 | 0.3% |
|
| 1.59 | 67 | 0.3% |
|
| Other values (2072) | 22928 | 91.7% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 1253 | 5.0% |
|
| 0.01 | 124 | 0.5% |
|
| 0.02 | 55 | 0.2% |
|
| 0.03 | 43 | 0.2% |
|
| 0.04 | 14 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.04 | 4 | 0.0% |
|
| 99.43 | 1 | 0.0% |
|
| 99.52 | 4 | 0.0% |
|
| 99.91 | 2 | 0.0% |
|
| 100.0 | 71 | 0.3% |
|
category_ratio_net_annual_expense
Numeric
| Distinct count | 74 |
|---|---|
| Unique (%) | 0.3% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.0135 |
|---|---|
| Minimum | 0.39 |
| Maximum | 2.6 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 0.39 |
|---|---|
| 5-th percentile | 0.45 |
| Q1 | 0.81 |
| Median | 1.02 |
| Q3 | 1.18 |
| 95-th percentile | 1.57 |
| Maximum | 2.6 |
| Range | 2.21 |
| Interquartile range | 0.37 |
Descriptive statistics
| Standard deviation | 0.329 |
|---|---|
| Coef of variation | 0.32461 |
| Kurtosis | 2.2873 |
| Mean | 1.0135 |
| MAD | 0.2355 |
| Skewness | 0.78106 |
| Sum | 25338 |
| Variance | 0.10824 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.06 | 1971 | 7.9% |
|
| 0.94 | 1515 | 6.1% |
|
| 0.45 | 1351 | 5.4% |
|
| 1.11 | 1141 | 4.6% |
|
| 1.01 | 1133 | 4.5% |
|
| 1.0 | 1126 | 4.5% |
|
| 0.76 | 960 | 3.8% |
|
| 1.36 | 894 | 3.6% |
|
| 0.75 | 864 | 3.5% |
|
| 0.82 | 793 | 3.2% |
|
| Other values (64) | 13252 | 53.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.39 | 199 | 0.8% |
|
| 0.43 | 135 | 0.5% |
|
| 0.44 | 240 | 1.0% |
|
| 0.45 | 1351 | 5.4% |
|
| 0.46 | 249 | 1.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2.08 | 8 | 0.0% |
|
| 2.17 | 392 | 1.6% |
|
| 2.18 | 53 | 0.2% |
|
| 2.28 | 4 | 0.0% |
|
| 2.6 | 15 | 0.1% |
|
category_return_1month
Numeric
| Distinct count | 88 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.87087 |
|---|---|
| Minimum | -3.49 |
| Maximum | 9.68 |
| Zeros (%) | 0.8% |
Quantile statistics
| Minimum | -3.49 |
|---|---|
| 5-th percentile | -1.21 |
| Q1 | 0.46 |
| Median | 1.11 |
| Q3 | 1.38 |
| 95-th percentile | 2.12 |
| Maximum | 9.68 |
| Range | 13.17 |
| Interquartile range | 0.92 |
Descriptive statistics
| Standard deviation | 1.2061 |
|---|---|
| Coef of variation | 1.3849 |
| Kurtosis | 4.37 |
| Mean | 0.87087 |
| MAD | 0.79881 |
| Skewness | -0.76316 |
| Sum | 21671 |
| Variance | 1.4547 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.16 | 1442 | 5.8% |
|
| 2.12 | 1333 | 5.3% |
|
| 1.29 | 1270 | 5.1% |
|
| 0.7 | 1192 | 4.8% |
|
| 1.7 | 1171 | 4.7% |
|
| 0.46 | 1121 | 4.5% |
|
| 1.11 | 1103 | 4.4% |
|
| 1.14 | 757 | 3.0% |
|
| -2.31 | 683 | 2.7% |
|
| 0.8 | 666 | 2.7% |
|
| Other values (77) | 14147 | 56.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -3.49 | 74 | 0.3% |
|
| -3.24 | 15 | 0.1% |
|
| -3.2 | 418 | 1.7% |
|
| -2.55 | 2 | 0.0% |
|
| -2.31 | 683 | 2.7% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 3.67 | 202 | 0.8% |
|
| 3.82 | 54 | 0.2% |
|
| 4.2 | 106 | 0.4% |
|
| 5.19 | 24 | 0.1% |
|
| 9.68 | 15 | 0.1% |
|
category_return_1year
Numeric
| Distinct count | 100 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.7365 |
|---|---|
| Minimum | -10.92 |
| Maximum | 17.48 |
| Zeros (%) | 0.8% |
Quantile statistics
| Minimum | -10.92 |
|---|---|
| 5-th percentile | -7.87 |
| Q1 | 0.66 |
| Median | 3.07 |
| Q3 | 4.52 |
| 95-th percentile | 10.71 |
| Maximum | 17.48 |
| Range | 28.4 |
| Interquartile range | 3.86 |
Descriptive statistics
| Standard deviation | 5.0265 |
|---|---|
| Coef of variation | 1.8368 |
| Kurtosis | 0.89117 |
| Mean | 2.7365 |
| MAD | 3.5542 |
| Skewness | -0.31835 |
| Sum | 68098 |
| Variance | 25.265 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 10.71 | 1333 | 5.3% |
|
| 6.9 | 1270 | 5.1% |
|
| 4.48 | 1121 | 4.5% |
|
| 3.98 | 1113 | 4.5% |
|
| 1.85 | 850 | 3.4% |
|
| -5.01 | 773 | 3.1% |
|
| -9.31 | 757 | 3.0% |
|
| 3.9 | 708 | 2.8% |
|
| -0.03 | 683 | 2.7% |
|
| 4.33 | 666 | 2.7% |
|
| Other values (89) | 15611 | 62.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -10.92 | 51 | 0.2% |
|
| -10.58 | 102 | 0.4% |
|
| -10.06 | 53 | 0.2% |
|
| -9.96 | 136 | 0.5% |
|
| -9.52 | 75 | 0.3% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 10.71 | 1333 | 5.3% |
|
| 13.05 | 106 | 0.4% |
|
| 14.41 | 199 | 0.8% |
|
| 17.08 | 225 | 0.9% |
|
| 17.48 | 50 | 0.2% |
|
category_return_2015
Numeric
| Distinct count | 99 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 18.4% |
| Missing (n) | 4601 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -2.253 |
|---|---|
| Minimum | -34.98 |
| Maximum | 11.97 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -34.98 |
|---|---|
| 5-th percentile | -13.79 |
| Q1 | -4.01 |
| Median | -1.69 |
| Q3 | -0.26 |
| 95-th percentile | 3.6 |
| Maximum | 11.97 |
| Range | 46.95 |
| Interquartile range | 3.75 |
Descriptive statistics
| Standard deviation | 4.9993 |
|---|---|
| Coef of variation | -2.219 |
| Kurtosis | 12.296 |
| Mean | -2.253 |
| MAD | 2.9549 |
| Skewness | -2.667 |
| Sum | -45958 |
| Variance | 24.993 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 3.6 | 1144 | 4.6% |
|
| -1.07 | 1088 | 4.4% |
|
| -4.05 | 971 | 3.9% |
|
| -0.26 | 801 | 3.2% |
|
| -1.59 | 738 | 3.0% |
|
| -1.69 | 688 | 2.8% |
|
| -1.93 | 625 | 2.5% |
|
| -13.79 | 623 | 2.5% |
|
| -5.38 | 562 | 2.2% |
|
| -4.01 | 560 | 2.2% |
|
| Other values (88) | 12599 | 50.4% |
|
| (Missing) | 4601 | 18.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -34.98 | 88 | 0.4% |
|
| -29.95 | 12 | 0.0% |
|
| -27.39 | 58 | 0.2% |
|
| -23.99 | 92 | 0.4% |
|
| -23.25 | 53 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 4.15 | 12 | 0.0% |
|
| 5.21 | 138 | 0.6% |
|
| 7.05 | 100 | 0.4% |
|
| 8.05 | 95 | 0.4% |
|
| 11.97 | 22 | 0.1% |
|
fund_return_3months
Highly correlated
This variable is highly correlated with ytd_return_fund and should be ignored for analysis
| Correlation | 0.97222 |
|---|
fund_return_3years
Numeric
| Distinct count | 2650 |
|---|---|
| Unique (%) | 10.6% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 6.9992 |
|---|---|
| Minimum | -36.02 |
| Maximum | 38.42 |
| Zeros (%) | 6.2% |
Quantile statistics
| Minimum | -36.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.81 |
| Median | 6.82 |
| Q3 | 10.21 |
| 95-th percentile | 16.16 |
| Maximum | 38.42 |
| Range | 74.44 |
| Interquartile range | 7.4 |
Descriptive statistics
| Standard deviation | 5.4604 |
|---|---|
| Coef of variation | 0.78016 |
| Kurtosis | 3.82 |
| Mean | 6.9992 |
| MAD | 4.1873 |
| Skewness | 0.065025 |
| Sum | 174170 |
| Variance | 29.816 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 1540 | 6.2% |
|
| 7.12 | 35 | 0.1% |
|
| 6.32 | 35 | 0.1% |
|
| 1.65 | 33 | 0.1% |
|
| 8.99 | 31 | 0.1% |
|
| 1.67 | 31 | 0.1% |
|
| 2.21 | 30 | 0.1% |
|
| 7.17 | 30 | 0.1% |
|
| 9.65 | 30 | 0.1% |
|
| 8.5 | 30 | 0.1% |
|
| Other values (2639) | 23060 | 92.2% |
|
| (Missing) | 115 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -36.02 | 1 | 0.0% |
|
| -35.33 | 1 | 0.0% |
|
| -34.52 | 1 | 0.0% |
|
| -34.12 | 1 | 0.0% |
|
| -33.88 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 34.35 | 1 | 0.0% |
|
| 34.57 | 1 | 0.0% |
|
| 35.18 | 2 | 0.0% |
|
| 37.04 | 1 | 0.0% |
|
| 38.42 | 1 | 0.0% |
|
greatstone_rating
Numeric
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 20.0% |
| Missing (n) | 5000 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.8397 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros (%) | 5.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| Median | 3 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range | 2 |
Descriptive statistics
| Standard deviation | 1.2774 |
|---|---|
| Coef of variation | 0.44984 |
| Kurtosis | -0.17408 |
| Mean | 2.8397 |
| MAD | 0.99599 |
| Skewness | -0.448 |
| Sum | 56795 |
| Variance | 1.6319 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 5.0 | 1629 | 6.5% |
|
| 1.0 | 1376 | 5.5% |
|
| 0.0 | 1365 | 5.5% |
|
| (Missing) | 5000 | 20.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 1365 | 5.5% |
|
| 1.0 | 1376 | 5.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 1.0 | 1376 | 5.5% |
|
| 2.0 | 4230 | 16.9% |
|
| 3.0 | 6786 | 27.1% |
|
| 4.0 | 4614 | 18.5% |
|
| 5.0 | 1629 | 6.5% |
|
mmc
Categorical
| Distinct count | 5689 |
|---|---|
| Unique (%) | 22.8% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
6008
|
|---|---|
| 828.01 |
|
| 2,193.13 |
|
| Other values (5685) |
18762
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 6008 | 24.0% |
|
| 828.01 | 75 | 0.3% |
|
| 2,193.13 | 41 | 0.2% |
|
| 9,234.14 | 34 | 0.1% |
|
| 88,146.69 | 17 | 0.1% |
|
| 95,232.43 | 17 | 0.1% |
|
| 1,063.09 | 17 | 0.1% |
|
| 43,954.74 | 17 | 0.1% |
|
| 39,247.34 | 17 | 0.1% |
|
| 23,042.48 | 17 | 0.1% |
|
| Other values (5678) | 18626 | 74.5% |
|
| (Missing) | 114 | 0.5% |
|
pb_ratio
Numeric
| Distinct count | 604 |
|---|---|
| Unique (%) | 2.4% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.8543 |
|---|---|
| Minimum | 0 |
| Maximum | 123.3 |
| Zeros (%) | 24.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.56 |
| Median | 1.85 |
| Q3 | 2.38 |
| 95-th percentile | 4.5 |
| Maximum | 123.3 |
| Range | 123.3 |
| Interquartile range | 1.82 |
Descriptive statistics
| Standard deviation | 2.9842 |
|---|---|
| Coef of variation | 1.6094 |
| Kurtosis | 1211.6 |
| Mean | 1.8543 |
| MAD | 1.1158 |
| Skewness | 30.129 |
| Sum | 46145 |
| Variance | 8.9057 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 6059 | 24.2% |
|
| 2.0 | 235 | 0.9% |
|
| 2.01 | 218 | 0.9% |
|
| 1.94 | 181 | 0.7% |
|
| 2.13 | 180 | 0.7% |
|
| 1.96 | 173 | 0.7% |
|
| 2.03 | 172 | 0.7% |
|
| 1.92 | 170 | 0.7% |
|
| 2.02 | 167 | 0.7% |
|
| 1.99 | 158 | 0.6% |
|
| Other values (593) | 17173 | 68.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 6059 | 24.2% |
|
| 0.12 | 2 | 0.0% |
|
| 0.26 | 7 | 0.0% |
|
| 0.27 | 6 | 0.0% |
|
| 0.29 | 5 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 10.77 | 1 | 0.0% |
|
| 11.17 | 2 | 0.0% |
|
| 14.07 | 4 | 0.0% |
|
| 22.47 | 17 | 0.1% |
|
| 123.3 | 11 | 0.0% |
|
pc_ratio
Categorical
| Distinct count | 1584 |
|---|---|
| Unique (%) | 6.3% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
4144
|
|---|---|
| 0.0 |
|
| 6.99 |
|
| Other values (1580) |
18744
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 4144 | 16.6% |
|
| 0.0 | 1900 | 7.6% |
|
| 6.99 | 98 | 0.4% |
|
| 7.18 | 98 | 0.4% |
|
| 0.46 | 92 | 0.4% |
|
| 7.21 | 81 | 0.3% |
|
| 7.63 | 78 | 0.3% |
|
| 7.54 | 77 | 0.3% |
|
| 7.23 | 76 | 0.3% |
|
| 7.96 | 76 | 0.3% |
|
| Other values (1573) | 18166 | 72.7% |
|
| (Missing) | 114 | 0.5% |
|
pe_ratio
Categorical
| Distinct count | 1782 |
|---|---|
| Unique (%) | 7.1% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0 |
4128
|
|---|---|
| 0.0 |
|
| 3.65 |
|
| Other values (1778) |
18756
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 0 | 4128 | 16.5% |
|
| 0.0 | 1910 | 7.6% |
|
| 3.65 | 92 | 0.4% |
|
| 15.14 | 89 | 0.4% |
|
| 15.37 | 87 | 0.3% |
|
| 15.87 | 86 | 0.3% |
|
| 17.05 | 69 | 0.3% |
|
| 16.15 | 67 | 0.3% |
|
| 16.57 | 66 | 0.3% |
|
| 15.09 | 66 | 0.3% |
|
| Other values (1771) | 18226 | 72.9% |
|
| (Missing) | 114 | 0.5% |
|
portfolio_convertable
Numeric
| Distinct count | 400 |
|---|---|
| Unique (%) | 1.6% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.57097 |
|---|---|
| Minimum | 0 |
| Maximum | 98.86 |
| Zeros (%) | 68.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.07 |
| 95-th percentile | 1.29 |
| Maximum | 98.86 |
| Range | 98.86 |
| Interquartile range | 0.07 |
Descriptive statistics
| Standard deviation | 4.8273 |
|---|---|
| Coef of variation | 8.4545 |
| Kurtosis | 231.33 |
| Mean | 0.57097 |
| MAD | 0.96172 |
| Skewness | 14.695 |
| Sum | 14209 |
| Variance | 23.303 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 17032 | 68.1% |
|
| 0.01 | 309 | 1.2% |
|
| 0.08 | 290 | 1.2% |
|
| 0.02 | 275 | 1.1% |
|
| 0.06 | 274 | 1.1% |
|
| 0.05 | 249 | 1.0% |
|
| 0.03 | 238 | 1.0% |
|
| 0.1 | 233 | 0.9% |
|
| 0.07 | 233 | 0.9% |
|
| 0.09 | 221 | 0.9% |
|
| Other values (389) | 5532 | 22.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 17032 | 68.1% |
|
| 0.01 | 309 | 1.2% |
|
| 0.02 | 275 | 1.1% |
|
| 0.03 | 238 | 1.0% |
|
| 0.04 | 207 | 0.8% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 83.05 | 3 | 0.0% |
|
| 83.23 | 5 | 0.0% |
|
| 83.49 | 2 | 0.0% |
|
| 83.89 | 3 | 0.0% |
|
| 98.86 | 4 | 0.0% |
|
portfolio_others
Numeric
| Distinct count | 723 |
|---|---|
| Unique (%) | 2.9% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.0558 |
|---|---|
| Minimum | 0 |
| Maximum | 98.84 |
| Zeros (%) | 56.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.28 |
| 95-th percentile | 5.12 |
| Maximum | 98.84 |
| Range | 98.84 |
| Interquartile range | 0.28 |
Descriptive statistics
| Standard deviation | 4.4478 |
|---|---|
| Coef of variation | 4.2128 |
| Kurtosis | 128.3 |
| Mean | 1.0558 |
| MAD | 1.6447 |
| Skewness | 9.7283 |
| Sum | 26274 |
| Variance | 19.783 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 14058 | 56.2% |
|
| 0.04 | 492 | 2.0% |
|
| 0.01 | 484 | 1.9% |
|
| 0.02 | 374 | 1.5% |
|
| 0.05 | 355 | 1.4% |
|
| 0.03 | 353 | 1.4% |
|
| 0.06 | 267 | 1.1% |
|
| 0.08 | 216 | 0.9% |
|
| 0.07 | 193 | 0.8% |
|
| 0.09 | 180 | 0.7% |
|
| Other values (712) | 7914 | 31.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 14058 | 56.2% |
|
| 0.01 | 484 | 1.9% |
|
| 0.02 | 374 | 1.5% |
|
| 0.03 | 353 | 1.4% |
|
| 0.04 | 492 | 2.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 64.87 | 7 | 0.0% |
|
| 66.96 | 1 | 0.0% |
|
| 77.87 | 7 | 0.0% |
|
| 93.57 | 4 | 0.0% |
|
| 98.84 | 1 | 0.0% |
|
portfolio_preferred
Numeric
| Distinct count | 380 |
|---|---|
| Unique (%) | 1.5% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.31252 |
|---|---|
| Minimum | 0 |
| Maximum | 80.87 |
| Zeros (%) | 73.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.01 |
| 95-th percentile | 0.9675 |
| Maximum | 80.87 |
| Range | 80.87 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 2.1508 |
|---|---|
| Coef of variation | 6.8821 |
| Kurtosis | 452.32 |
| Mean | 0.31252 |
| MAD | 0.5342 |
| Skewness | 17.687 |
| Sum | 7777.3 |
| Variance | 4.6258 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 18290 | 73.2% |
|
| 0.01 | 835 | 3.3% |
|
| 0.02 | 467 | 1.9% |
|
| 0.03 | 306 | 1.2% |
|
| 0.1 | 212 | 0.8% |
|
| 0.04 | 204 | 0.8% |
|
| 0.05 | 200 | 0.8% |
|
| 0.07 | 195 | 0.8% |
|
| 0.08 | 178 | 0.7% |
|
| 0.06 | 161 | 0.6% |
|
| Other values (369) | 3838 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 18290 | 73.2% |
|
| 0.01 | 835 | 3.3% |
|
| 0.02 | 467 | 1.9% |
|
| 0.03 | 306 | 1.2% |
|
| 0.04 | 204 | 0.8% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 45.86 | 2 | 0.0% |
|
| 53.32 | 2 | 0.0% |
|
| 56.65 | 4 | 0.0% |
|
| 62.37 | 1 | 0.0% |
|
| 80.87 | 3 | 0.0% |
|
ps_ratio
Categorical
| Distinct count | 556 |
|---|---|
| Unique (%) | 2.2% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| 0.0 |
3959
|
|---|---|
| 0 |
|
| 1.49 |
|
| Other values (552) |
18628
|
| Value | Count | Frequency (%) | |
| 0.0 | 3959 | 15.8% |
|
| 0 | 2026 | 8.1% |
|
| 1.49 | 273 | 1.1% |
|
| 1.47 | 252 | 1.0% |
|
| 1.51 | 249 | 1.0% |
|
| 1.45 | 238 | 1.0% |
|
| 1.5 | 221 | 0.9% |
|
| 0.99 | 193 | 0.8% |
|
| 1.31 | 183 | 0.7% |
|
| 1.54 | 180 | 0.7% |
|
| Other values (545) | 17112 | 68.4% |
|
stock_percent_of_portfolio
Numeric
| Distinct count | 2739 |
|---|---|
| Unique (%) | 11.0% |
| Missing (%) | 0.5% |
| Missing (n) | 114 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 59.122 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 20.6% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.5225 |
| Median | 82.65 |
| Q3 | 97.6 |
| 95-th percentile | 99.6 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 97.078 |
Descriptive statistics
| Standard deviation | 42.251 |
|---|---|
| Coef of variation | 0.71464 |
| Kurtosis | -1.5795 |
| Mean | 59.122 |
| MAD | 39.172 |
| Skewness | -0.44936 |
| Sum | 1471300 |
| Variance | 1785.1 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 5145 | 20.6% |
|
| 100.0 | 444 | 1.8% |
|
| 0.01 | 144 | 0.6% |
|
| 0.02 | 84 | 0.3% |
|
| 0.03 | 84 | 0.3% |
|
| 97.8 | 62 | 0.2% |
|
| 99.05 | 61 | 0.2% |
|
| 0.04 | 52 | 0.2% |
|
| 98.78 | 51 | 0.2% |
|
| 99.29 | 50 | 0.2% |
|
| Other values (2728) | 18709 | 74.8% |
|
| (Missing) | 114 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 5145 | 20.6% |
|
| 0.01 | 144 | 0.6% |
|
| 0.02 | 84 | 0.3% |
|
| 0.03 | 84 | 0.3% |
|
| 0.04 | 52 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.96 | 12 | 0.0% |
|
| 99.97 | 14 | 0.1% |
|
| 99.98 | 27 | 0.1% |
|
| 99.99 | 34 | 0.1% |
|
| 100.0 | 444 | 1.8% |
|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
years_down
Numeric
| Distinct count | 27 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 6.6% |
| Missing (n) | 1641 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 3.2425 |
|---|---|
| Minimum | 1 |
| Maximum | 28 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| Median | 3 |
| Q3 | 4 |
| 95-th percentile | 7 |
| Maximum | 28 |
| Range | 27 |
| Interquartile range | 3 |
Descriptive statistics
| Standard deviation | 2.3227 |
|---|---|
| Coef of variation | 0.71634 |
| Kurtosis | 8.241 |
| Mean | 3.2425 |
| MAD | 1.746 |
| Skewness | 1.9961 |
| Sum | 75742 |
| Variance | 5.3951 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.0 | 5945 | 23.8% |
|
| 2.0 | 4950 | 19.8% |
|
| 3.0 | 3752 | 15.0% |
|
| 4.0 | 3286 | 13.1% |
|
| 5.0 | 2162 | 8.6% |
|
| 6.0 | 1330 | 5.3% |
|
| 7.0 | 828 | 3.3% |
|
| 8.0 | 472 | 1.9% |
|
| 9.0 | 261 | 1.0% |
|
| 10.0 | 110 | 0.4% |
|
| Other values (16) | 263 | 1.1% |
|
| (Missing) | 1641 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 1.0 | 5945 | 23.8% |
|
| 2.0 | 4950 | 19.8% |
|
| 3.0 | 3752 | 15.0% |
|
| 4.0 | 3286 | 13.1% |
|
| 5.0 | 2162 | 8.6% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 22.0 | 3 | 0.0% |
|
| 23.0 | 2 | 0.0% |
|
| 24.0 | 5 | 0.0% |
|
| 26.0 | 1 | 0.0% |
|
| 28.0 | 1 | 0.0% |
|
years_up
Numeric
| Distinct count | 68 |
|---|---|
| Unique (%) | 0.3% |
| Missing (%) | 7.2% |
| Missing (n) | 1812 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 8.4193 |
|---|---|
| Minimum | 1 |
| Maximum | 70 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| Median | 7 |
| Q3 | 12 |
| 95-th percentile | 21 |
| Maximum | 70 |
| Range | 69 |
| Interquartile range | 9 |
Descriptive statistics
| Standard deviation | 6.9673 |
|---|---|
| Coef of variation | 0.82754 |
| Kurtosis | 6.8523 |
| Mean | 8.4193 |
| MAD | 5.303 |
| Skewness | 1.8537 |
| Sum | 195230 |
| Variance | 48.544 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 2.0 | 3100 | 12.4% |
|
| 1.0 | 1896 | 7.6% |
|
| 3.0 | 1718 | 6.9% |
|
| 7.0 | 1598 | 6.4% |
|
| 5.0 | 1598 | 6.4% |
|
| 4.0 | 1537 | 6.1% |
|
| 8.0 | 1302 | 5.2% |
|
| 6.0 | 1296 | 5.2% |
|
| 12.0 | 1016 | 4.1% |
|
| 9.0 | 972 | 3.9% |
|
| Other values (57) | 7155 | 28.6% |
|
| (Missing) | 1812 | 7.2% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 1.0 | 1896 | 7.6% |
|
| 2.0 | 3100 | 12.4% |
|
| 3.0 | 1718 | 6.9% |
|
| 4.0 | 1537 | 6.1% |
|
| 5.0 | 1598 | 6.4% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 66.0 | 3 | 0.0% |
|
| 67.0 | 1 | 0.0% |
|
| 68.0 | 2 | 0.0% |
|
| 69.0 | 1 | 0.0% |
|
| 70.0 | 2 | 0.0% |
|
ytd_return_category
Numeric
| Distinct count | 100 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.1687 |
|---|---|
| Minimum | -17.38 |
| Maximum | 20.95 |
| Zeros (%) | 0.8% |
Quantile statistics
| Minimum | -17.38 |
|---|---|
| 5-th percentile | 1.71 |
| Q1 | 5.02 |
| Median | 10.24 |
| Q3 | 12.94 |
| 95-th percentile | 17.01 |
| Maximum | 20.95 |
| Range | 38.33 |
| Interquartile range | 7.92 |
Descriptive statistics
| Standard deviation | 4.9997 |
|---|---|
| Coef of variation | 0.5453 |
| Kurtosis | 0.50549 |
| Mean | 9.1687 |
| MAD | 4.2093 |
| Skewness | -0.31774 |
| Sum | 228160 |
| Variance | 24.997 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 12.94 | 1655 | 6.6% |
|
| 15.67 | 1333 | 5.3% |
|
| 11.29 | 1121 | 4.5% |
|
| 3.13 | 957 | 3.8% |
|
| 12.27 | 875 | 3.5% |
|
| 10.27 | 757 | 3.0% |
|
| 8.89 | 708 | 2.8% |
|
| 13.34 | 683 | 2.7% |
|
| 10.24 | 680 | 2.7% |
|
| 4.17 | 672 | 2.7% |
|
| Other values (89) | 15444 | 61.8% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -17.38 | 53 | 0.2% |
|
| 0.0 | 188 | 0.8% |
|
| 0.12 | 142 | 0.6% |
|
| 1.02 | 164 | 0.7% |
|
| 1.04 | 93 | 0.4% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 17.01 | 645 | 2.6% |
|
| 18.19 | 578 | 2.3% |
|
| 19.1 | 106 | 0.4% |
|
| 19.73 | 160 | 0.6% |
|
| 20.95 | 54 | 0.2% |
|
ytd_return_fund
Numeric
| Distinct count | 2744 |
|---|---|
| Unique (%) | 11.0% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.2898 |
|---|---|
| Minimum | -36.3 |
| Maximum | 46.29 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -36.3 |
|---|---|
| 5-th percentile | 1.37 |
| Q1 | 4.43 |
| Median | 9.82 |
| Q3 | 13.08 |
| 95-th percentile | 18.32 |
| Maximum | 46.29 |
| Range | 82.59 |
| Interquartile range | 8.65 |
Descriptive statistics
| Standard deviation | 5.7977 |
|---|---|
| Coef of variation | 0.6241 |
| Kurtosis | 2.2354 |
| Mean | 9.2898 |
| MAD | 4.6294 |
| Skewness | -0.11739 |
| Sum | 231180 |
| Variance | 33.613 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 11.88 | 36 | 0.1% |
|
| 2.45 | 36 | 0.1% |
|
| 11.33 | 34 | 0.1% |
|
| 10.94 | 34 | 0.1% |
|
| 11.76 | 33 | 0.1% |
|
| 2.76 | 32 | 0.1% |
|
| 2.62 | 32 | 0.1% |
|
| 11.21 | 31 | 0.1% |
|
| 3.4 | 31 | 0.1% |
|
| 10.27 | 31 | 0.1% |
|
| Other values (2733) | 24555 | 98.2% |
|
| (Missing) | 115 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -36.3 | 1 | 0.0% |
|
| -36.14 | 1 | 0.0% |
|
| -27.8 | 1 | 0.0% |
|
| -27.79 | 1 | 0.0% |
|
| -27.7 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 38.96 | 1 | 0.0% |
|
| 41.33 | 1 | 0.0% |
|
| 45.78 | 1 | 0.0% |
|
| 45.88 | 1 | 0.0% |
|
| 46.29 | 1 | 0.0% |
|
| 2014_category_return | 2012_return_category | years_up | 2018_return_category | tag | category_return_1year | cash_percent_of_portfolio | pc_ratio | 2011_return_category | ytd_return_fund | years_down | 2014_return_fund | category_return_1month | 2013_return_fund | fund_return_3months | ytd_return_category | pb_ratio | 2017_category_return | 1_year_return_fund | pe_ratio | 2015_return_fund | portfolio_convertable | 3_months_return_category | portfolio_others | 2016_return_fund | mmc | stock_percent_of_portfolio | 2016_return_category | ps_ratio | 2011_return_fund | 2010_return_fund | fund_return_3years | 2012_fund_return | 2018_return_fund | 2017_return_fund | greatstone_rating | category_ratio_net_annual_expense | category_return_2015 | 1_month_fund_return | bond_percentage_of_porfolio | portfolio_preferred | 2010_return_category | 2013_category_return | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | 1.0 | -16.32 | 67922 | 13.05 | 1.19 | 5.91 | NaN | 20.19 | 2.0 | NaN | 4.20 | NaN | 20.19 | 19.10 | 1.71 | -5.78 | 18.40 | 14.51 | NaN | 0.00 | 19.10 | 0.00 | 16.14 | 19,857.41 | 98.81 | 27.30 | 1.31 | NaN | NaN | 4.24 | NaN | -12.23 | -3.31 | NaN | 1.75 | -34.98 | 4.12 | 0.00 | 0.00 | NaN | NaN |
| 1 | 10.00 | 15.34 | 5.0 | -2.09 | 134783 | 10.71 | 0.10 | 15.95 | NaN | 16.79 | 1.0 | 14.25 | 2.12 | 35.46 | 16.79 | 15.67 | 5.30 | 27.67 | 12.18 | 18.88 | 5.60 | 0.00 | 15.67 | 0.00 | 1.64 | 72,347.03 | 99.90 | 3.23 | 3.38 | NaN | NaN | 14.39 | NaN | -2.62 | 26.39 | 3.0 | 1.06 | 3.60 | 2.33 | 0.00 | 0.00 | NaN | 33.92 |
| 2 | 10.00 | 15.34 | 26.0 | -2.09 | 61271 | 10.71 | 2.00 | 15.97 | -2.46 | 17.13 | 5.0 | 11.04 | 2.12 | 30.42 | 17.13 | 15.67 | 5.40 | 27.67 | 19.77 | 23.27 | 3.68 | 0.00 | 15.67 | 0.22 | 2.32 | 68,857.43 | 97.12 | 3.23 | 3.67 | -2.23 | 17.23 | 16.42 | 15.52 | 5.04 | 25.79 | 4.0 | 1.06 | 3.60 | 3.77 | 0.58 | 0.08 | 15.53 | 33.92 |
| 3 | 10.21 | 14.57 | 11.0 | -8.53 | 64412 | 4.48 | 6.13 | 8.93 | -0.75 | 11.63 | 2.0 | 12.32 | 0.46 | 29.31 | 11.63 | 11.29 | 2.23 | 15.94 | 7.11 | 12.7 | 2.09 | 0.00 | 11.29 | 0.00 | 14.66 | 43,266.62 | 93.87 | 14.81 | 1.63 | 0.08 | 15.63 | 6.85 | 17.66 | -7.54 | 8.53 | 3.0 | 1.00 | -4.05 | 1.46 | 0.00 | 0.00 | 13.66 | 31.21 |
| 4 | NaN | NaN | 1.0 | -7.04 | 184058 | 3.17 | 6.59 | 7.59 | NaN | 10.25 | 1.0 | NaN | 1.28 | NaN | 10.25 | 10.36 | 2.02 | 18.43 | 3.11 | 14.74 | NaN | 0.09 | 10.36 | 0.80 | NaN | 43,747.9 | 67.41 | NaN | 1.4 | NaN | NaN | 0.00 | NaN | -7.37 | 17.52 | 0.0 | 0.45 | NaN | 1.28 | 24.97 | 0.02 | NaN | NaN |
#return_3years contains 17 columns which give information about 3 year return and ratios
return_3year = pd.read_csv('Hackathon_Files/external/return_3year.csv')
pandas_profiling.ProfileReport(return_3year)
Dataset info
| Number of variables | 17 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 2.9% |
| Total size in memory | 3.2 MiB |
| Average record size in memory | 136.0 B |
Variables types
| Numeric | 15 |
|---|---|
| Categorical | 1 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 1 |
| Unsupported | 0 |
Warnings
3_years_alpha_category has 5394 / 21.6% zeros Zeros3_years_alpha_fund has 1648 / 6.6% missing values Missing3_years_return_mean_annual_category has 8694 / 34.8% zeros Zeros3_years_return_mean_annual_fund has 1648 / 6.6% missing values Missing3years_category_std has 352 / 1.4% zeros Zeros3years_fund_r_squared has 1648 / 6.6% missing values Missing3years_fund_std has 1648 / 6.6% missing values Missing3yrs_sharpe_ratio_category has 6789 / 27.2% zeros Zeros3yrs_sharpe_ratio_fund has 1648 / 6.6% missing values Missing3yrs_treynor_ratio_category has 1068 / 4.3% zeros Zeros3yrs_treynor_ratio_fund has 1648 / 6.6% missing values Missing3yrs_treynor_ratio_fund has a high cardinality: 3470 distinct values Warningcategory_beta_3years has 3200 / 12.8% zeros Zerosfund_beta_3years is highly skewed (γ1 = -20.832) Skewedfund_beta_3years has 1648 / 6.6% missing values Missingfund_return_3years is highly correlated with 3_years_return_mean_annual_fund (ρ = 0.99472) Rejected3_years_alpha_category
Numeric
| Distinct count | 17 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -0.004579 |
|---|---|
| Minimum | -0.12 |
| Maximum | 0.11 |
| Zeros (%) | 21.6% |
Quantile statistics
| Minimum | -0.12 |
|---|---|
| 5-th percentile | -0.05 |
| Q1 | -0.01 |
| Median | -0.01 |
| Q3 | 0 |
| 95-th percentile | 0.03 |
| Maximum | 0.11 |
| Range | 0.23 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 0.023468 |
|---|---|
| Coef of variation | -5.1252 |
| Kurtosis | 2.9032 |
| Mean | -0.004579 |
| MAD | 0.016484 |
| Skewness | 0.039778 |
| Sum | -113.99 |
| Variance | 0.00055076 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.01 | 7323 | 29.3% |
|
| -0.0 | 5394 | 21.6% |
|
| -0.02 | 3339 | 13.4% |
|
| 0.01 | 2609 | 10.4% |
|
| 0.03 | 1700 | 6.8% |
|
| -0.05 | 1071 | 4.3% |
|
| 0.02 | 741 | 3.0% |
|
| 0.05 | 714 | 2.9% |
|
| -0.04 | 563 | 2.3% |
|
| -0.03 | 525 | 2.1% |
|
| Other values (6) | 915 | 3.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.12 | 77 | 0.3% |
|
| -0.06 | 418 | 1.7% |
|
| -0.05 | 1071 | 4.3% |
|
| -0.04 | 563 | 2.3% |
|
| -0.03 | 525 | 2.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.04 | 230 | 0.9% |
|
| 0.05 | 714 | 2.9% |
|
| 0.06 | 15 | 0.1% |
|
| 0.08 | 160 | 0.6% |
|
| 0.11 | 15 | 0.1% |
|
3_years_alpha_fund
Numeric
| Distinct count | 2088 |
|---|---|
| Unique (%) | 8.4% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -0.57702 |
|---|---|
| Minimum | -36.24 |
| Maximum | 19.15 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | -36.24 |
|---|---|
| 5-th percentile | -6.1945 |
| Q1 | -2.1 |
| Median | -0.59 |
| Q3 | 0.89 |
| 95-th percentile | 5.07 |
| Maximum | 19.15 |
| Range | 55.39 |
| Interquartile range | 2.99 |
Descriptive statistics
| Standard deviation | 3.3798 |
|---|---|
| Coef of variation | -5.8574 |
| Kurtosis | 5.122 |
| Mean | -0.57702 |
| MAD | 2.3343 |
| Skewness | -0.32443 |
| Sum | -13475 |
| Variance | 11.423 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.88 | 73 | 0.3% |
|
| -0.38 | 69 | 0.3% |
|
| -0.72 | 69 | 0.3% |
|
| -0.42 | 69 | 0.3% |
|
| -0.7 | 67 | 0.3% |
|
| -0.53 | 67 | 0.3% |
|
| -0.28 | 66 | 0.3% |
|
| -0.46 | 65 | 0.3% |
|
| -0.4 | 65 | 0.3% |
|
| -0.96 | 65 | 0.3% |
|
| Other values (2077) | 22677 | 90.7% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -36.24 | 1 | 0.0% |
|
| -35.25 | 1 | 0.0% |
|
| -33.59 | 1 | 0.0% |
|
| -30.89 | 1 | 0.0% |
|
| -29.83 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 17.5 | 1 | 0.0% |
|
| 18.15 | 1 | 0.0% |
|
| 18.55 | 1 | 0.0% |
|
| 18.82 | 1 | 0.0% |
|
| 19.15 | 1 | 0.0% |
|
3_years_return_category
Numeric
| Distinct count | 103 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 7.4618 |
|---|---|
| Minimum | -19.79 |
| Maximum | 21.78 |
| Zeros (%) | 0.8% |
Quantile statistics
| Minimum | -19.79 |
|---|---|
| 5-th percentile | 1.62 |
| Q1 | 4.36 |
| Median | 7.44 |
| Q3 | 10.01 |
| 95-th percentile | 15.35 |
| Maximum | 21.78 |
| Range | 41.57 |
| Interquartile range | 5.65 |
Descriptive statistics
| Standard deviation | 4.4433 |
|---|---|
| Coef of variation | 0.59547 |
| Kurtosis | 2.4665 |
| Mean | 7.4618 |
| MAD | 3.5023 |
| Skewness | -0.12176 |
| Sum | 185690 |
| Variance | 19.743 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 15.35 | 1333 | 5.3% |
|
| 11.84 | 1270 | 5.1% |
|
| 10.01 | 1121 | 4.5% |
|
| 2.37 | 957 | 3.8% |
|
| 9.96 | 850 | 3.4% |
|
| 9.11 | 757 | 3.0% |
|
| 7.44 | 708 | 2.8% |
|
| 10.17 | 683 | 2.7% |
|
| 6.62 | 680 | 2.7% |
|
| 6.97 | 666 | 2.7% |
|
| Other values (92) | 15860 | 63.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -19.79 | 53 | 0.2% |
|
| -2.14 | 114 | 0.5% |
|
| -0.05 | 75 | 0.3% |
|
| 0.0 | 188 | 0.8% |
|
| 0.71 | 142 | 0.6% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 14.18 | 578 | 2.3% |
|
| 15.35 | 1333 | 5.3% |
|
| 15.88 | 645 | 2.6% |
|
| 16.77 | 15 | 0.1% |
|
| 21.78 | 160 | 0.6% |
|
3_years_return_mean_annual_category
Numeric
| Distinct count | 5 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.006514 |
|---|---|
| Minimum | -0.02 |
| Maximum | 0.02 |
| Zeros (%) | 34.8% |
Quantile statistics
| Minimum | -0.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.02 |
| Range | 0.04 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 0.0050391 |
|---|---|
| Coef of variation | 0.77357 |
| Kurtosis | 0.11524 |
| Mean | 0.006514 |
| MAD | 0.0046628 |
| Skewness | -0.71749 |
| Sum | 162.16 |
| Variance | 2.5392e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 15972 | 63.9% |
|
| 0.0 | 8694 | 34.8% |
|
| 0.02 | 175 | 0.7% |
|
| -0.02 | 53 | 0.2% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 8694 | 34.8% |
|
| 0.01 | 15972 | 63.9% |
|
| 0.02 | 175 | 0.7% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 8694 | 34.8% |
|
| 0.01 | 15972 | 63.9% |
|
| 0.02 | 175 | 0.7% |
|
3_years_return_mean_annual_fund
Numeric
| Distinct count | 389 |
|---|---|
| Unique (%) | 1.6% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.63623 |
|---|---|
| Minimum | -3.19 |
| Maximum | 2.98 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -3.19 |
|---|---|
| 5-th percentile | 0.09 |
| Q1 | 0.33 |
| Median | 0.62 |
| Q3 | 0.89 |
| 95-th percentile | 1.36 |
| Maximum | 2.98 |
| Range | 6.17 |
| Interquartile range | 0.56 |
Descriptive statistics
| Standard deviation | 0.43605 |
|---|---|
| Coef of variation | 0.68536 |
| Kurtosis | 5.1738 |
| Mean | 0.63623 |
| MAD | 0.33086 |
| Skewness | -0.1965 |
| Sum | 14857 |
| Variance | 0.19014 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.14 | 266 | 1.1% |
|
| 0.8 | 265 | 1.1% |
|
| 0.63 | 255 | 1.0% |
|
| 0.62 | 255 | 1.0% |
|
| 0.17 | 252 | 1.0% |
|
| 0.15 | 249 | 1.0% |
|
| 0.76 | 249 | 1.0% |
|
| 0.18 | 245 | 1.0% |
|
| 0.19 | 243 | 1.0% |
|
| 0.2 | 243 | 1.0% |
|
| Other values (378) | 20830 | 83.3% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -3.19 | 1 | 0.0% |
|
| -3.14 | 1 | 0.0% |
|
| -3.11 | 1 | 0.0% |
|
| -3.09 | 2 | 0.0% |
|
| -2.92 | 3 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2.81 | 1 | 0.0% |
|
| 2.82 | 1 | 0.0% |
|
| 2.86 | 2 | 0.0% |
|
| 2.9 | 1 | 0.0% |
|
| 2.98 | 1 | 0.0% |
|
3years_category_r_squared
Numeric
| Distinct count | 58 |
|---|---|
| Unique (%) | 0.2% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.71632 |
|---|---|
| Minimum | 0 |
| Maximum | 0.97 |
| Zeros (%) | 0.7% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.08 |
| Q1 | 0.63 |
| Median | 0.81 |
| Q3 | 0.89 |
| 95-th percentile | 0.96 |
| Maximum | 0.97 |
| Range | 0.97 |
| Interquartile range | 0.26 |
Descriptive statistics
| Standard deviation | 0.25094 |
|---|---|
| Coef of variation | 0.35032 |
| Kurtosis | 1.2269 |
| Mean | 0.71632 |
| MAD | 0.1909 |
| Skewness | -1.4381 |
| Sum | 17832 |
| Variance | 0.06297 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.84 | 2637 | 10.5% |
|
| 0.88 | 1659 | 6.6% |
|
| 0.95 | 1488 | 6.0% |
|
| 0.92 | 1485 | 5.9% |
|
| 0.81 | 1395 | 5.6% |
|
| 0.67 | 1003 | 4.0% |
|
| 0.89 | 959 | 3.8% |
|
| 0.04 | 945 | 3.8% |
|
| 0.6 | 929 | 3.7% |
|
| 0.96 | 877 | 3.5% |
|
| Other values (47) | 11517 | 46.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 187 | 0.7% |
|
| 0.01 | 30 | 0.1% |
|
| 0.03 | 57 | 0.2% |
|
| 0.04 | 945 | 3.8% |
|
| 0.08 | 142 | 0.6% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.93 | 58 | 0.2% |
|
| 0.94 | 419 | 1.7% |
|
| 0.95 | 1488 | 6.0% |
|
| 0.96 | 877 | 3.5% |
|
| 0.97 | 572 | 2.3% |
|
3years_category_std
Numeric
| Distinct count | 23 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.088854 |
|---|---|
| Minimum | 0 |
| Maximum | 0.33 |
| Zeros (%) | 1.4% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.03 |
| Q1 | 0.04 |
| Median | 0.09 |
| Q3 | 0.13 |
| 95-th percentile | 0.16 |
| Maximum | 0.33 |
| Range | 0.33 |
| Interquartile range | 0.09 |
Descriptive statistics
| Standard deviation | 0.047886 |
|---|---|
| Coef of variation | 0.53893 |
| Kurtosis | 0.46609 |
| Mean | 0.088854 |
| MAD | 0.040696 |
| Skewness | 0.32364 |
| Sum | 2211.9 |
| Variance | 0.0022931 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.11 | 4583 | 18.3% |
|
| 0.13 | 3456 | 13.8% |
|
| 0.03 | 3321 | 13.3% |
|
| 0.16 | 2100 | 8.4% |
|
| 0.04 | 2057 | 8.2% |
|
| 0.05 | 1584 | 6.3% |
|
| 0.07 | 1546 | 6.2% |
|
| 0.09 | 1537 | 6.1% |
|
| 0.08 | 797 | 3.2% |
|
| 0.01 | 795 | 3.2% |
|
| Other values (12) | 3118 | 12.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 352 | 1.4% |
|
| 0.01 | 795 | 3.2% |
|
| 0.02 | 39 | 0.2% |
|
| 0.03 | 3321 | 13.3% |
|
| 0.04 | 2057 | 8.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.18 | 222 | 0.9% |
|
| 0.2 | 53 | 0.2% |
|
| 0.24 | 77 | 0.3% |
|
| 0.27 | 15 | 0.1% |
|
| 0.33 | 57 | 0.2% |
|
3years_fund_r_squared
Numeric
| Distinct count | 6897 |
|---|---|
| Unique (%) | 27.6% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 72.558 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.53 |
| Q1 | 64.24 |
| Median | 81.91 |
| Q3 | 92.7 |
| 95-th percentile | 97.95 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 28.46 |
Descriptive statistics
| Standard deviation | 27.191 |
|---|---|
| Coef of variation | 0.37475 |
| Kurtosis | 0.88892 |
| Mean | 72.558 |
| MAD | 20.934 |
| Skewness | -1.3644 |
| Sum | 1694400 |
| Variance | 739.36 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 100.0 | 65 | 0.3% |
|
| 99.99 | 31 | 0.1% |
|
| 97.42 | 23 | 0.1% |
|
| 97.34 | 22 | 0.1% |
|
| 97.01 | 21 | 0.1% |
|
| 97.68 | 21 | 0.1% |
|
| 95.52 | 21 | 0.1% |
|
| 96.26 | 21 | 0.1% |
|
| 96.31 | 20 | 0.1% |
|
| 97.08 | 20 | 0.1% |
|
| Other values (6886) | 23087 | 92.3% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 12 | 0.0% |
|
| 0.01 | 12 | 0.0% |
|
| 0.02 | 9 | 0.0% |
|
| 0.03 | 5 | 0.0% |
|
| 0.04 | 11 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.96 | 5 | 0.0% |
|
| 99.97 | 12 | 0.0% |
|
| 99.98 | 13 | 0.1% |
|
| 99.99 | 31 | 0.1% |
|
| 100.0 | 65 | 0.3% |
|
3years_fund_std
Numeric
| Distinct count | 2194 |
|---|---|
| Unique (%) | 8.8% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.053 |
|---|---|
| Minimum | 0.18 |
| Maximum | 50.49 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 0.18 |
|---|---|
| 5-th percentile | 2.0155 |
| Q1 | 4.3 |
| Median | 9.66 |
| Q3 | 12.42 |
| 95-th percentile | 16.5 |
| Maximum | 50.49 |
| Range | 50.31 |
| Interquartile range | 8.12 |
Descriptive statistics
| Standard deviation | 5.1263 |
|---|---|
| Coef of variation | 0.56625 |
| Kurtosis | 2.0319 |
| Mean | 9.053 |
| MAD | 4.197 |
| Skewness | 0.6855 |
| Sum | 211400 |
| Variance | 26.279 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 2.86 | 74 | 0.3% |
|
| 2.88 | 59 | 0.2% |
|
| 2.92 | 54 | 0.2% |
|
| 2.94 | 53 | 0.2% |
|
| 2.85 | 51 | 0.2% |
|
| 2.93 | 51 | 0.2% |
|
| 10.74 | 51 | 0.2% |
|
| 3.02 | 46 | 0.2% |
|
| 2.96 | 45 | 0.2% |
|
| 3.08 | 44 | 0.2% |
|
| Other values (2183) | 22824 | 91.3% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.18 | 2 | 0.0% |
|
| 0.22 | 4 | 0.0% |
|
| 0.23 | 1 | 0.0% |
|
| 0.24 | 3 | 0.0% |
|
| 0.25 | 6 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 48.88 | 1 | 0.0% |
|
| 49.56 | 1 | 0.0% |
|
| 49.57 | 1 | 0.0% |
|
| 50.44 | 1 | 0.0% |
|
| 50.49 | 1 | 0.0% |
|
3yrs_sharpe_ratio_category
Numeric
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0071547 |
|---|---|
| Minimum | -0.01 |
| Maximum | 0.01 |
| Zeros (%) | 27.2% |
Quantile statistics
| Minimum | -0.01 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.01 |
| Range | 0.02 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 0.004641 |
|---|---|
| Coef of variation | 0.64866 |
| Kurtosis | -0.25496 |
| Mean | 0.0071547 |
| MAD | 0.004105 |
| Skewness | -1.1313 |
| Sum | 178.11 |
| Variance | 2.1539e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 17958 | 71.8% |
|
| 0.0 | 6789 | 27.2% |
|
| -0.01 | 147 | 0.6% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 147 | 0.6% |
|
| 0.0 | 6789 | 27.2% |
|
| 0.01 | 17958 | 71.8% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 147 | 0.6% |
|
| 0.0 | 6789 | 27.2% |
|
| 0.01 | 17958 | 71.8% |
|
3yrs_sharpe_ratio_fund
Numeric
| Distinct count | 434 |
|---|---|
| Unique (%) | 1.7% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.68227 |
|---|---|
| Minimum | -4.39 |
| Maximum | 4.16 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | -4.39 |
|---|---|
| 5-th percentile | -0.08 |
| Q1 | 0.44 |
| Median | 0.74 |
| Q3 | 0.97 |
| 95-th percentile | 1.3 |
| Maximum | 4.16 |
| Range | 8.55 |
| Interquartile range | 0.53 |
Descriptive statistics
| Standard deviation | 0.4626 |
|---|---|
| Coef of variation | 0.67802 |
| Kurtosis | 4.7869 |
| Mean | 0.68227 |
| MAD | 0.34264 |
| Skewness | -0.77384 |
| Sum | 15932 |
| Variance | 0.21399 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.92 | 335 | 1.3% |
|
| 0.94 | 329 | 1.3% |
|
| 0.88 | 293 | 1.2% |
|
| 0.84 | 290 | 1.2% |
|
| 0.9 | 287 | 1.1% |
|
| 0.89 | 281 | 1.1% |
|
| 0.96 | 279 | 1.1% |
|
| 0.86 | 279 | 1.1% |
|
| 0.87 | 276 | 1.1% |
|
| 0.98 | 270 | 1.1% |
|
| Other values (423) | 20433 | 81.7% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -4.39 | 1 | 0.0% |
|
| -3.19 | 1 | 0.0% |
|
| -2.85 | 1 | 0.0% |
|
| -2.81 | 1 | 0.0% |
|
| -2.72 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 3.28 | 1 | 0.0% |
|
| 3.75 | 1 | 0.0% |
|
| 3.77 | 1 | 0.0% |
|
| 3.78 | 1 | 0.0% |
|
| 4.16 | 1 | 0.0% |
|
3yrs_treynor_ratio_category
Numeric
| Distinct count | 29 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.069803 |
|---|---|
| Minimum | -0.76 |
| Maximum | 0.3 |
| Zeros (%) | 4.3% |
Quantile statistics
| Minimum | -0.76 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.05 |
| Median | 0.06 |
| Q3 | 0.1 |
| 95-th percentile | 0.18 |
| Maximum | 0.3 |
| Range | 1.06 |
| Interquartile range | 0.05 |
Descriptive statistics
| Standard deviation | 0.068808 |
|---|---|
| Coef of variation | 0.98575 |
| Kurtosis | 9.5101 |
| Mean | 0.069803 |
| MAD | 0.042991 |
| Skewness | -0.21323 |
| Sum | 1737.7 |
| Variance | 0.0047346 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.06 | 5117 | 20.5% |
|
| 0.05 | 3018 | 12.1% |
|
| 0.01 | 2419 | 9.7% |
|
| 0.08 | 2225 | 8.9% |
|
| 0.11 | 1855 | 7.4% |
|
| 0.13 | 1333 | 5.3% |
|
| 0.09 | 1303 | 5.2% |
|
| 0.03 | 1240 | 5.0% |
|
| 0.1 | 1225 | 4.9% |
|
| -0.0 | 1068 | 4.3% |
|
| Other values (18) | 4091 | 16.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.76 | 2 | 0.0% |
|
| -0.28 | 230 | 0.9% |
|
| -0.16 | 51 | 0.2% |
|
| -0.05 | 113 | 0.5% |
|
| -0.03 | 94 | 0.4% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.18 | 15 | 0.1% |
|
| 0.19 | 218 | 0.9% |
|
| 0.25 | 302 | 1.2% |
|
| 0.26 | 50 | 0.2% |
|
| 0.3 | 664 | 2.7% |
|
3yrs_treynor_ratio_fund
Categorical
| Distinct count | 3470 |
|---|---|
| Unique (%) | 13.9% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| 5.5 |
|
|---|---|
| 5.96 |
|
| 6.2 |
|
| Other values (3466) |
23216
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 5.5 | 46 | 0.2% |
|
| 5.96 | 45 | 0.2% |
|
| 6.2 | 45 | 0.2% |
|
| 6.1 | 43 | 0.2% |
|
| 5.53 | 41 | 0.2% |
|
| 6.12 | 40 | 0.2% |
|
| 5.72 | 40 | 0.2% |
|
| 5.7 | 40 | 0.2% |
|
| 5.97 | 39 | 0.2% |
|
| 5.87 | 39 | 0.2% |
|
| Other values (3459) | 22934 | 91.7% |
|
| (Missing) | 1648 | 6.6% |
|
category_beta_3years
Numeric
| Distinct count | 6 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.008782 |
|---|---|
| Minimum | -0.01 |
| Maximum | 0.03 |
| Zeros (%) | 12.8% |
Quantile statistics
| Minimum | -0.01 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.01 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.03 |
| Range | 0.04 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 0.0036776 |
|---|---|
| Coef of variation | 0.41876 |
| Kurtosis | 4.5323 |
| Mean | 0.008782 |
| MAD | 0.0023377 |
| Skewness | -1.5548 |
| Sum | 218.62 |
| Variance | 1.3525e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 21392 | 85.6% |
|
| 0.0 | 3200 | 12.8% |
|
| 0.02 | 224 | 0.9% |
|
| -0.01 | 53 | 0.2% |
|
| 0.03 | 25 | 0.1% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 53 | 0.2% |
|
| 0.0 | 3200 | 12.8% |
|
| 0.01 | 21392 | 85.6% |
|
| 0.02 | 224 | 0.9% |
|
| 0.03 | 25 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 53 | 0.2% |
|
| 0.0 | 3200 | 12.8% |
|
| 0.01 | 21392 | 85.6% |
|
| 0.02 | 224 | 0.9% |
|
| 0.03 | 25 | 0.1% |
|
fund_beta_3years
Numeric
| Distinct count | 357 |
|---|---|
| Unique (%) | 1.4% |
| Missing (%) | 6.6% |
| Missing (n) | 1648 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.91025 |
|---|---|
| Minimum | -39.66 |
| Maximum | 22.57 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -39.66 |
|---|---|
| 5-th percentile | 0.17 |
| Q1 | 0.77 |
| Median | 0.98 |
| Q3 | 1.14 |
| 95-th percentile | 1.44 |
| Maximum | 22.57 |
| Range | 62.23 |
| Interquartile range | 0.37 |
Descriptive statistics
| Standard deviation | 0.63713 |
|---|---|
| Coef of variation | 0.69995 |
| Kurtosis | 1505.1 |
| Mean | 0.91025 |
| MAD | 0.29819 |
| Skewness | -20.832 |
| Sum | 21256 |
| Variance | 0.40593 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.0 | 581 | 2.3% |
|
| 1.03 | 524 | 2.1% |
|
| 0.98 | 495 | 2.0% |
|
| 1.02 | 494 | 2.0% |
|
| 0.96 | 425 | 1.7% |
|
| 1.04 | 409 | 1.6% |
|
| 1.08 | 406 | 1.6% |
|
| 1.01 | 377 | 1.5% |
|
| 1.1 | 376 | 1.5% |
|
| 1.06 | 374 | 1.5% |
|
| Other values (346) | 18891 | 75.6% |
|
| (Missing) | 1648 | 6.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -39.66 | 1 | 0.0% |
|
| -39.59 | 1 | 0.0% |
|
| -11.46 | 1 | 0.0% |
|
| -11.43 | 2 | 0.0% |
|
| -6.72 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 11.38 | 1 | 0.0% |
|
| 12.8 | 1 | 0.0% |
|
| 12.81 | 1 | 0.0% |
|
| 12.82 | 1 | 0.0% |
|
| 22.57 | 1 | 0.0% |
|
fund_return_3years
Highly correlated
This variable is highly correlated with 3_years_return_mean_annual_fund and should be ignored for analysis
| Correlation | 0.99472 |
|---|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
| tag | 3yrs_treynor_ratio_fund | 3_years_alpha_fund | 3years_category_std | 3yrs_sharpe_ratio_fund | 3yrs_treynor_ratio_category | 3_years_return_mean_annual_fund | fund_beta_3years | 3years_fund_r_squared | 3years_fund_std | category_beta_3years | fund_return_3years | 3_years_alpha_category | 3_years_return_mean_annual_category | 3yrs_sharpe_ratio_category | 3years_category_r_squared | 3_years_return_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 67922 | 2.46 | -7.10 | 0.18 | 0.26 | 0.05 | 0.45 | 1.20 | 54.83 | 16.25 | 0.01 | 4.24 | -0.04 | 0.01 | 0.00 | 0.42 | 7.36 |
| 1 | 134783 | 12.2 | 0.07 | 0.13 | 1.06 | 0.13 | 1.19 | 1.07 | 88.46 | 12.26 | 0.01 | 14.39 | 0.01 | 0.01 | 0.01 | 0.84 | 15.35 |
| 2 | 61271 | 17.88 | 4.32 | 0.13 | 1.46 | 0.13 | 1.32 | 0.85 | 84.41 | 9.93 | 0.01 | 16.42 | 0.01 | 0.01 | 0.01 | 0.84 | 15.35 |
| 3 | 64412 | 7.93 | -2.73 | 0.11 | 0.68 | 0.09 | 0.58 | 0.70 | 81.02 | 8.36 | 0.01 | 6.85 | -0.02 | 0.01 | 0.01 | 0.84 | 10.01 |
| 4 | 184058 | NaN | NaN | 0.08 | NaN | 0.06 | NaN | NaN | NaN | NaN | 0.01 | 0.00 | -0.01 | 0.01 | 0.01 | 0.97 | 9.13 |
#return_5years contains 17 columns which give information about 5 year return and ratios
return_5year = pd.read_csv('Hackathon_Files/external/return_5year.csv')
pandas_profiling.ProfileReport(return_5year)
Dataset info
| Number of variables | 17 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 6.5% |
| Total size in memory | 3.2 MiB |
| Average record size in memory | 136.0 B |
Variables types
| Numeric | 15 |
|---|---|
| Categorical | 1 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 0 |
| Rejected | 1 |
| Unsupported | 0 |
Warnings
5_years_alpha_category has 7500 / 30.0% zeros Zeros5_years_alpha_fund has 3843 / 15.4% missing values Missing5_years_beta_category has 3603 / 14.4% zeros Zeros5_years_beta_fund has 3843 / 15.4% missing values Missing5_years_return_fund is highly correlated with 5_years_return_mean_annual_fund (ρ = 0.98935) Rejected5_years_return_mean_annual_category has 14722 / 58.9% zeros Zeros5_years_return_mean_annual_fund has 3843 / 15.4% missing values Missing5years_fund_r_squared has 3843 / 15.4% missing values Missing5years_fund_std has 3843 / 15.4% missing values Missing5yrs_sharpe_ratio_category has 9702 / 38.8% zeros Zeros5yrs_sharpe_ratio_fund has 3843 / 15.4% missing values Missing5yrs_treynor_ratio_category has 1125 / 4.5% zeros Zeros5yrs_treynor_ratio_fund has 3843 / 15.4% missing values Missing5yrs_treynor_ratio_fund has a high cardinality: 2834 distinct values Warning5_years_alpha_category
Numeric
| Distinct count | 19 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -0.0081208 |
|---|---|
| Minimum | -0.18 |
| Maximum | 0.08 |
| Zeros (%) | 30.0% |
Quantile statistics
| Minimum | -0.18 |
|---|---|
| 5-th percentile | -0.06 |
| Q1 | -0.02 |
| Median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.04 |
| Maximum | 0.08 |
| Range | 0.26 |
| Interquartile range | 0.02 |
Descriptive statistics
| Standard deviation | 0.026415 |
|---|---|
| Coef of variation | -3.2527 |
| Kurtosis | 6.5381 |
| Mean | -0.0081208 |
| MAD | 0.017991 |
| Skewness | -1.0094 |
| Sum | -202.16 |
| Variance | 0.00069773 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.0 | 7500 | 30.0% |
|
| -0.01 | 4837 | 19.3% |
|
| -0.03 | 3107 | 12.4% |
|
| 0.01 | 2300 | 9.2% |
|
| -0.02 | 2163 | 8.7% |
|
| 0.04 | 1176 | 4.7% |
|
| 0.02 | 1030 | 4.1% |
|
| -0.06 | 686 | 2.7% |
|
| -0.07 | 418 | 1.7% |
|
| -0.04 | 404 | 1.6% |
|
| Other values (8) | 1273 | 5.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.18 | 77 | 0.3% |
|
| -0.11 | 106 | 0.4% |
|
| -0.08 | 93 | 0.4% |
|
| -0.07 | 418 | 1.7% |
|
| -0.06 | 686 | 2.7% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.03 | 303 | 1.2% |
|
| 0.04 | 1176 | 4.7% |
|
| 0.05 | 30 | 0.1% |
|
| 0.06 | 104 | 0.4% |
|
| 0.08 | 175 | 0.7% |
|
5_years_alpha_fund
Numeric
| Distinct count | 2017 |
|---|---|
| Unique (%) | 8.1% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -0.83676 |
|---|---|
| Minimum | -34.57 |
| Maximum | 15.05 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | -34.57 |
|---|---|
| 5-th percentile | -6.39 |
| Q1 | -2.12 |
| Median | -0.49 |
| Q3 | 0.7 |
| 95-th percentile | 3.84 |
| Maximum | 15.05 |
| Range | 49.62 |
| Interquartile range | 2.82 |
Descriptive statistics
| Standard deviation | 3.3011 |
|---|---|
| Coef of variation | -3.9451 |
| Kurtosis | 7.3809 |
| Mean | -0.83676 |
| MAD | 2.2429 |
| Skewness | -1.1349 |
| Sum | -17703 |
| Variance | 10.897 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.16 | 69 | 0.3% |
|
| -0.12 | 66 | 0.3% |
|
| -0.2 | 64 | 0.3% |
|
| -0.06 | 62 | 0.2% |
|
| 0.16 | 62 | 0.2% |
|
| -0.55 | 61 | 0.2% |
|
| -0.18 | 60 | 0.2% |
|
| -0.1 | 60 | 0.2% |
|
| -0.34 | 60 | 0.2% |
|
| -0.38 | 60 | 0.2% |
|
| Other values (2006) | 20533 | 82.1% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -34.57 | 1 | 0.0% |
|
| -33.59 | 1 | 0.0% |
|
| -30.58 | 1 | 0.0% |
|
| -30.22 | 1 | 0.0% |
|
| -29.84 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 13.32 | 1 | 0.0% |
|
| 14.02 | 1 | 0.0% |
|
| 14.82 | 1 | 0.0% |
|
| 14.96 | 1 | 0.0% |
|
| 15.05 | 1 | 0.0% |
|
5_years_beta_category
Numeric
| Distinct count | 6 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0086101 |
|---|---|
| Minimum | -0.02 |
| Maximum | 0.03 |
| Zeros (%) | 14.4% |
Quantile statistics
| Minimum | -0.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.01 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.03 |
| Range | 0.05 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 0.0040121 |
|---|---|
| Coef of variation | 0.46598 |
| Kurtosis | 7.8208 |
| Mean | 0.0086101 |
| MAD | 0.0026142 |
| Skewness | -1.6648 |
| Sum | 214.34 |
| Variance | 1.6097e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 20989 | 84.0% |
|
| 0.0 | 3603 | 14.4% |
|
| 0.02 | 196 | 0.8% |
|
| -0.02 | 53 | 0.2% |
|
| 0.03 | 53 | 0.2% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 3603 | 14.4% |
|
| 0.01 | 20989 | 84.0% |
|
| 0.02 | 196 | 0.8% |
|
| 0.03 | 53 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 3603 | 14.4% |
|
| 0.01 | 20989 | 84.0% |
|
| 0.02 | 196 | 0.8% |
|
| 0.03 | 53 | 0.2% |
|
5_years_beta_fund
Numeric
| Distinct count | 340 |
|---|---|
| Unique (%) | 1.4% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.89786 |
|---|---|
| Minimum | -38.85 |
| Maximum | 24.72 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -38.85 |
|---|---|
| 5-th percentile | 0.18 |
| Q1 | 0.77 |
| Median | 0.97 |
| Q3 | 1.1 |
| 95-th percentile | 1.4 |
| Maximum | 24.72 |
| Range | 63.57 |
| Interquartile range | 0.33 |
Descriptive statistics
| Standard deviation | 0.6422 |
|---|---|
| Coef of variation | 0.71526 |
| Kurtosis | 1514.9 |
| Mean | 0.89786 |
| MAD | 0.27947 |
| Skewness | -19.761 |
| Sum | 18996 |
| Variance | 0.41242 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.0 | 599 | 2.4% |
|
| 1.02 | 517 | 2.1% |
|
| 1.06 | 478 | 1.9% |
|
| 1.04 | 437 | 1.7% |
|
| 0.92 | 429 | 1.7% |
|
| 0.99 | 424 | 1.7% |
|
| 0.96 | 423 | 1.7% |
|
| 0.94 | 419 | 1.7% |
|
| 1.08 | 405 | 1.6% |
|
| 1.01 | 403 | 1.6% |
|
| Other values (329) | 16623 | 66.5% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -38.85 | 1 | 0.0% |
|
| -38.77 | 1 | 0.0% |
|
| -9.98 | 1 | 0.0% |
|
| -9.92 | 1 | 0.0% |
|
| -9.89 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 11.68 | 1 | 0.0% |
|
| 11.7 | 1 | 0.0% |
|
| 11.73 | 1 | 0.0% |
|
| 15.21 | 1 | 0.0% |
|
| 24.72 | 1 | 0.0% |
|
5_years_return_category
Numeric
| Distinct count | 100 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4.7573 |
|---|---|
| Minimum | -17 |
| Maximum | 15.26 |
| Zeros (%) | 0.8% |
Quantile statistics
| Minimum | -17 |
|---|---|
| 5-th percentile | 0.87 |
| Q1 | 2.61 |
| Median | 4.23 |
| Q3 | 6.41 |
| 95-th percentile | 11.26 |
| Maximum | 15.26 |
| Range | 32.26 |
| Interquartile range | 3.8 |
Descriptive statistics
| Standard deviation | 3.4094 |
|---|---|
| Coef of variation | 0.71666 |
| Kurtosis | 4.9594 |
| Mean | 4.7573 |
| MAD | 2.5855 |
| Skewness | -0.63204 |
| Sum | 118390 |
| Variance | 11.624 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 11.26 | 1333 | 5.3% |
|
| 8.91 | 1270 | 5.1% |
|
| 7.2 | 1121 | 4.5% |
|
| 2.51 | 957 | 3.8% |
|
| 5.89 | 926 | 3.7% |
|
| 3.45 | 854 | 3.4% |
|
| 2.61 | 757 | 3.0% |
|
| 5.12 | 708 | 2.8% |
|
| 5.62 | 683 | 2.7% |
|
| 2.1 | 680 | 2.7% |
|
| Other values (89) | 15596 | 62.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -17.0 | 53 | 0.2% |
|
| -11.01 | 75 | 0.3% |
|
| -8.35 | 109 | 0.4% |
|
| -4.25 | 106 | 0.4% |
|
| -2.08 | 2 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 8.99 | 578 | 2.3% |
|
| 9.53 | 199 | 0.8% |
|
| 10.02 | 101 | 0.4% |
|
| 11.26 | 1333 | 5.3% |
|
| 15.26 | 160 | 0.6% |
|
5_years_return_fund
Highly correlated
This variable is highly correlated with 5_years_return_mean_annual_fund and should be ignored for analysis
| Correlation | 0.98935 |
|---|
5_years_return_mean_annual_category
Numeric
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0038949 |
|---|---|
| Minimum | -0.01 |
| Maximum | 0.01 |
| Zeros (%) | 58.9% |
Quantile statistics
| Minimum | -0.01 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.01 |
| Range | 0.02 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 0.0050687 |
|---|---|
| Coef of variation | 1.3014 |
| Kurtosis | -1.414 |
| Mean | 0.0038949 |
| MAD | 0.0048725 |
| Skewness | 0.23203 |
| Sum | 96.96 |
| Variance | 2.5692e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.0 | 14722 | 58.9% |
|
| 0.01 | 9934 | 39.7% |
|
| -0.01 | 238 | 1.0% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 238 | 1.0% |
|
| -0.0 | 14722 | 58.9% |
|
| 0.01 | 9934 | 39.7% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 238 | 1.0% |
|
| -0.0 | 14722 | 58.9% |
|
| 0.01 | 9934 | 39.7% |
|
5_years_return_mean_annual_fund
Numeric
| Distinct count | 329 |
|---|---|
| Unique (%) | 1.3% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.43748 |
|---|---|
| Minimum | -2.96 |
| Maximum | 2.49 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -2.96 |
|---|---|
| 5-th percentile | 0.03 |
| Q1 | 0.23 |
| Median | 0.41 |
| Q3 | 0.63 |
| 95-th percentile | 1.01 |
| Maximum | 2.49 |
| Range | 5.45 |
| Interquartile range | 0.4 |
Descriptive statistics
| Standard deviation | 0.34125 |
|---|---|
| Coef of variation | 0.78003 |
| Kurtosis | 7.5068 |
| Mean | 0.43748 |
| MAD | 0.25249 |
| Skewness | -0.62923 |
| Sum | 9255.7 |
| Variance | 0.11645 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.23 | 348 | 1.4% |
|
| 0.24 | 348 | 1.4% |
|
| 0.3 | 341 | 1.4% |
|
| 0.22 | 334 | 1.3% |
|
| 0.29 | 334 | 1.3% |
|
| 0.25 | 329 | 1.3% |
|
| 0.27 | 328 | 1.3% |
|
| 0.21 | 325 | 1.3% |
|
| 0.28 | 322 | 1.3% |
|
| 0.34 | 320 | 1.3% |
|
| Other values (318) | 17828 | 71.3% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -2.96 | 1 | 0.0% |
|
| -2.9 | 1 | 0.0% |
|
| -2.88 | 1 | 0.0% |
|
| -2.86 | 2 | 0.0% |
|
| -2.85 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2.36 | 1 | 0.0% |
|
| 2.38 | 1 | 0.0% |
|
| 2.44 | 1 | 0.0% |
|
| 2.45 | 2 | 0.0% |
|
| 2.49 | 1 | 0.0% |
|
5years_category_std
Numeric
| Distinct count | 26 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.093149 |
|---|---|
| Minimum | 0 |
| Maximum | 0.36 |
| Zeros (%) | 0.7% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.02 |
| Q1 | 0.05 |
| Median | 0.1 |
| Q3 | 0.13 |
| 95-th percentile | 0.16 |
| Maximum | 0.36 |
| Range | 0.36 |
| Interquartile range | 0.08 |
Descriptive statistics
| Standard deviation | 0.0496 |
|---|---|
| Coef of variation | 0.53248 |
| Kurtosis | 1.0779 |
| Mean | 0.093149 |
| MAD | 0.041488 |
| Skewness | 0.36764 |
| Sum | 2318.9 |
| Variance | 0.0024602 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.03 | 3226 | 12.9% |
|
| 0.12 | 3134 | 12.5% |
|
| 0.13 | 2586 | 10.3% |
|
| 0.11 | 2248 | 9.0% |
|
| 0.05 | 1766 | 7.1% |
|
| 0.08 | 1648 | 6.6% |
|
| 0.15 | 1444 | 5.8% |
|
| 0.09 | 1298 | 5.2% |
|
| 0.04 | 980 | 3.9% |
|
| 0.01 | 960 | 3.8% |
|
| Other values (15) | 5604 | 22.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 187 | 0.7% |
|
| 0.01 | 960 | 3.8% |
|
| 0.02 | 265 | 1.1% |
|
| 0.03 | 3226 | 12.9% |
|
| 0.04 | 980 | 3.9% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.21 | 54 | 0.2% |
|
| 0.22 | 53 | 0.2% |
|
| 0.26 | 77 | 0.3% |
|
| 0.28 | 15 | 0.1% |
|
| 0.36 | 57 | 0.2% |
|
5years_fund_r_squared
Numeric
| Distinct count | 6488 |
|---|---|
| Unique (%) | 26.0% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 72.453 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.23 |
| Q1 | 64.26 |
| Median | 82.36 |
| Q3 | 92.52 |
| 95-th percentile | 97.35 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 28.26 |
Descriptive statistics
| Standard deviation | 27.494 |
|---|---|
| Coef of variation | 0.37948 |
| Kurtosis | 0.96791 |
| Mean | 72.453 |
| MAD | 21.093 |
| Skewness | -1.4073 |
| Sum | 1532900 |
| Variance | 755.95 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 100.0 | 59 | 0.2% |
|
| 99.99 | 39 | 0.2% |
|
| 96.82 | 28 | 0.1% |
|
| 95.38 | 26 | 0.1% |
|
| 95.44 | 23 | 0.1% |
|
| 95.43 | 21 | 0.1% |
|
| 95.4 | 20 | 0.1% |
|
| 95.16 | 20 | 0.1% |
|
| 95.06 | 19 | 0.1% |
|
| 0.0 | 19 | 0.1% |
|
| Other values (6477) | 20883 | 83.5% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 19 | 0.1% |
|
| 0.01 | 16 | 0.1% |
|
| 0.02 | 18 | 0.1% |
|
| 0.03 | 6 | 0.0% |
|
| 0.04 | 14 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.95 | 3 | 0.0% |
|
| 99.97 | 5 | 0.0% |
|
| 99.98 | 15 | 0.1% |
|
| 99.99 | 39 | 0.2% |
|
| 100.0 | 59 | 0.2% |
|
5years_fund_std
Numeric
| Distinct count | 2179 |
|---|---|
| Unique (%) | 8.7% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.4574 |
|---|---|
| Minimum | 0.17 |
| Maximum | 56.67 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 0.17 |
|---|---|
| 5-th percentile | 2.068 |
| Q1 | 4.67 |
| Median | 10.34 |
| Q3 | 12.83 |
| 95-th percentile | 16.77 |
| Maximum | 56.67 |
| Range | 56.5 |
| Interquartile range | 8.16 |
Descriptive statistics
| Standard deviation | 5.3224 |
|---|---|
| Coef of variation | 0.56278 |
| Kurtosis | 2.7525 |
| Mean | 9.4574 |
| MAD | 4.3377 |
| Skewness | 0.71357 |
| Sum | 200090 |
| Variance | 28.328 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 2.74 | 71 | 0.3% |
|
| 11.18 | 60 | 0.2% |
|
| 2.84 | 57 | 0.2% |
|
| 2.66 | 54 | 0.2% |
|
| 2.83 | 54 | 0.2% |
|
| 2.72 | 52 | 0.2% |
|
| 2.76 | 50 | 0.2% |
|
| 11.5 | 48 | 0.2% |
|
| 2.8 | 47 | 0.2% |
|
| 2.86 | 47 | 0.2% |
|
| Other values (2168) | 20617 | 82.5% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.17 | 1 | 0.0% |
|
| 0.2 | 1 | 0.0% |
|
| 0.25 | 1 | 0.0% |
|
| 0.26 | 3 | 0.0% |
|
| 0.27 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 50.79 | 1 | 0.0% |
|
| 52.97 | 1 | 0.0% |
|
| 53.07 | 1 | 0.0% |
|
| 56.61 | 1 | 0.0% |
|
| 56.67 | 1 | 0.0% |
|
5yrs_sharpe_ratio_category
Numeric
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0059733 |
|---|---|
| Minimum | -0.01 |
| Maximum | 0.01 |
| Zeros (%) | 38.8% |
Quantile statistics
| Minimum | -0.01 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.01 |
| Range | 0.02 |
| Interquartile range | 0.01 |
Descriptive statistics
| Standard deviation | 0.0050346 |
|---|---|
| Coef of variation | 0.84285 |
| Kurtosis | -1.3252 |
| Mean | 0.0059733 |
| MAD | 0.0048626 |
| Skewness | -0.54861 |
| Sum | 148.7 |
| Variance | 2.5347e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 15031 | 60.1% |
|
| -0.0 | 9702 | 38.8% |
|
| -0.01 | 161 | 0.6% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 161 | 0.6% |
|
| -0.0 | 9702 | 38.8% |
|
| 0.01 | 15031 | 60.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 161 | 0.6% |
|
| -0.0 | 9702 | 38.8% |
|
| 0.01 | 15031 | 60.1% |
|
5yrs_sharpe_ratio_fund
Numeric
| Distinct count | 345 |
|---|---|
| Unique (%) | 1.4% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.51784 |
|---|---|
| Minimum | -5.14 |
| Maximum | 3.22 |
| Zeros (%) | 0.3% |
Quantile statistics
| Minimum | -5.14 |
|---|---|
| 5-th percentile | -0.08 |
| Q1 | 0.33 |
| Median | 0.55 |
| Q3 | 0.73 |
| 95-th percentile | 1.02 |
| Maximum | 3.22 |
| Range | 8.36 |
| Interquartile range | 0.4 |
Descriptive statistics
| Standard deviation | 0.36255 |
|---|---|
| Coef of variation | 0.70012 |
| Kurtosis | 7.0212 |
| Mean | 0.51784 |
| MAD | 0.26252 |
| Skewness | -0.81959 |
| Sum | 10956 |
| Variance | 0.13144 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.6 | 382 | 1.5% |
|
| 0.56 | 373 | 1.5% |
|
| 0.58 | 364 | 1.5% |
|
| 0.52 | 348 | 1.4% |
|
| 0.66 | 341 | 1.4% |
|
| 0.62 | 340 | 1.4% |
|
| 0.54 | 338 | 1.4% |
|
| 0.57 | 327 | 1.3% |
|
| 0.7 | 316 | 1.3% |
|
| 0.64 | 311 | 1.2% |
|
| Other values (334) | 17717 | 70.9% |
|
| (Missing) | 3843 | 15.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -5.14 | 1 | 0.0% |
|
| -2.64 | 1 | 0.0% |
|
| -2.56 | 1 | 0.0% |
|
| -2.44 | 1 | 0.0% |
|
| -1.84 | 2 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2.66 | 1 | 0.0% |
|
| 2.9 | 1 | 0.0% |
|
| 2.98 | 1 | 0.0% |
|
| 3.02 | 1 | 0.0% |
|
| 3.22 | 2 | 0.0% |
|
5yrs_treynor_ratio_category
Numeric
| Distinct count | 25 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.040969 |
|---|---|
| Minimum | -0.16 |
| Maximum | 0.32 |
| Zeros (%) | 4.5% |
Quantile statistics
| Minimum | -0.16 |
|---|---|
| 5-th percentile | -0.01 |
| Q1 | 0.02 |
| Median | 0.04 |
| Q3 | 0.07 |
| 95-th percentile | 0.1 |
| Maximum | 0.32 |
| Range | 0.48 |
| Interquartile range | 0.05 |
Descriptive statistics
| Standard deviation | 0.044379 |
|---|---|
| Coef of variation | 1.0832 |
| Kurtosis | 7.1312 |
| Mean | 0.040969 |
| MAD | 0.0288 |
| Skewness | -0.46642 |
| Sum | 1019.9 |
| Variance | 0.0019695 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.04 | 4525 | 18.1% |
|
| 0.02 | 3590 | 14.4% |
|
| 0.03 | 3051 | 12.2% |
|
| 0.07 | 2270 | 9.1% |
|
| 0.05 | 2125 | 8.5% |
|
| 0.06 | 1644 | 6.6% |
|
| 0.1 | 1333 | 5.3% |
|
| 0.01 | 1318 | 5.3% |
|
| 0.08 | 1273 | 5.1% |
|
| -0.0 | 1125 | 4.5% |
|
| Other values (14) | 2640 | 10.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.16 | 2 | 0.0% |
|
| -0.13 | 410 | 1.6% |
|
| -0.1 | 230 | 0.9% |
|
| -0.09 | 142 | 0.6% |
|
| -0.08 | 77 | 0.3% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.1 | 1333 | 5.3% |
|
| 0.11 | 677 | 2.7% |
|
| 0.12 | 454 | 1.8% |
|
| 0.25 | 50 | 0.2% |
|
| 0.32 | 51 | 0.2% |
|
5yrs_treynor_ratio_fund
Categorical
| Distinct count | 2834 |
|---|---|
| Unique (%) | 11.3% |
| Missing (%) | 15.4% |
| Missing (n) | 3843 |
| 3.56 |
|
|---|---|
| 3.8 |
|
| 3.84 |
|
| Other values (2830) |
21021
|
| (Missing) |
|
| Value | Count | Frequency (%) | |
| 3.56 | 46 | 0.2% |
|
| 3.8 | 45 | 0.2% |
|
| 3.84 | 45 | 0.2% |
|
| 3.64 | 43 | 0.2% |
|
| 2.8 | 41 | 0.2% |
|
| 3.32 | 41 | 0.2% |
|
| 4.18 | 41 | 0.2% |
|
| 3.86 | 40 | 0.2% |
|
| 2.92 | 40 | 0.2% |
|
| 2.48 | 39 | 0.2% |
|
| Other values (2823) | 20736 | 82.9% |
|
| (Missing) | 3843 | 15.4% |
|
category_r_squared_5years
Numeric
| Distinct count | 60 |
|---|---|
| Unique (%) | 0.2% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.71275 |
|---|---|
| Minimum | 0 |
| Maximum | 0.97 |
| Zeros (%) | 0.9% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.04 |
| Q1 | 0.64 |
| Median | 0.83 |
| Q3 | 0.89 |
| 95-th percentile | 0.95 |
| Maximum | 0.97 |
| Range | 0.97 |
| Interquartile range | 0.25 |
Descriptive statistics
| Standard deviation | 0.26011 |
|---|---|
| Coef of variation | 0.36494 |
| Kurtosis | 1.0817 |
| Mean | 0.71275 |
| MAD | 0.20099 |
| Skewness | -1.4307 |
| Sum | 17743 |
| Variance | 0.067658 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.86 | 2480 | 9.9% |
|
| 0.89 | 1684 | 6.7% |
|
| 0.78 | 1431 | 5.7% |
|
| 0.84 | 1423 | 5.7% |
|
| 0.93 | 1413 | 5.7% |
|
| 0.65 | 1353 | 5.4% |
|
| 0.94 | 1027 | 4.1% |
|
| 0.96 | 957 | 3.8% |
|
| 0.95 | 898 | 3.6% |
|
| 0.52 | 799 | 3.2% |
|
| Other values (49) | 11429 | 45.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 215 | 0.9% |
|
| 0.01 | 230 | 0.9% |
|
| 0.03 | 715 | 2.9% |
|
| 0.04 | 144 | 0.6% |
|
| 0.06 | 57 | 0.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.93 | 1413 | 5.7% |
|
| 0.94 | 1027 | 4.1% |
|
| 0.95 | 898 | 3.6% |
|
| 0.96 | 957 | 3.8% |
|
| 0.97 | 247 | 1.0% |
|
tag
Numeric
| Distinct count | 25000 |
|---|---|
| Unique (%) | 100.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 139880 |
|---|---|
| Minimum | 26000 |
| Maximum | 253763 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 37367 |
| Q1 | 83022 |
| Median | 139880 |
| Q3 | 196760 |
| 95-th percentile | 242390 |
| Maximum | 253763 |
| Range | 227763 |
| Interquartile range | 113740 |
Descriptive statistics
| Standard deviation | 65731 |
|---|---|
| Coef of variation | 0.46992 |
| Kurtosis | -1.199 |
| Mean | 139880 |
| MAD | 56921 |
| Skewness | 6.0424e-05 |
| Sum | 3496973366 |
| Variance | 4320600000 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 165887 | 1 | 0.0% |
|
| 193211 | 1 | 0.0% |
|
| 86687 | 1 | 0.0% |
|
| 174752 | 1 | 0.0% |
|
| 41633 | 1 | 0.0% |
|
| 144035 | 1 | 0.0% |
|
| 232100 | 1 | 0.0% |
|
| 98981 | 1 | 0.0% |
|
| 39590 | 1 | 0.0% |
|
| 201383 | 1 | 0.0% |
|
| Other values (24990) | 24990 | 100.0% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 26000 | 1 | 0.0% |
|
| 26009 | 1 | 0.0% |
|
| 26018 | 1 | 0.0% |
|
| 26027 | 1 | 0.0% |
|
| 26036 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 253727 | 1 | 0.0% |
|
| 253736 | 1 | 0.0% |
|
| 253745 | 1 | 0.0% |
|
| 253754 | 1 | 0.0% |
|
| 253763 | 1 | 0.0% |
|
| category_r_squared_5years | 5yrs_sharpe_ratio_fund | 5_years_alpha_fund | 5years_fund_r_squared | 5years_fund_std | 5yrs_sharpe_ratio_category | 5_years_beta_fund | 5yrs_treynor_ratio_fund | 5_years_return_mean_annual_fund | 5_years_return_mean_annual_category | 5yrs_treynor_ratio_category | 5_years_return_fund | 5_years_alpha_category | 5_years_beta_category | 5years_category_std | tag | 5_years_return_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.51 | NaN | NaN | NaN | NaN | -0.00 | NaN | NaN | NaN | -0.00 | -0.04 | 0.00 | -0.11 | 0.01 | 0.20 | 67922 | -4.25 |
| 1 | 0.86 | 0.89 | 0.34 | 90.11 | 12.40 | 0.01 | 1.05 | 10.37 | 0.99 | 0.01 | 0.10 | 11.71 | -0.00 | 0.01 | 0.13 | 134783 | 11.26 |
| 2 | 0.86 | 1.15 | 2.96 | 89.02 | 10.28 | 0.01 | 0.86 | 13.84 | 1.05 | 0.01 | 0.10 | 12.78 | -0.00 | 0.01 | 0.13 | 61271 | 11.26 |
| 3 | 0.86 | 0.77 | -0.50 | 82.36 | 8.53 | 0.01 | 0.69 | 9.3 | 0.62 | 0.01 | 0.07 | 7.25 | -0.03 | 0.01 | 0.11 | 64412 | 7.20 |
| 4 | 0.96 | NaN | NaN | NaN | NaN | 0.01 | NaN | NaN | NaN | 0.01 | 0.04 | 0.00 | -0.01 | 0.01 | 0.09 | 184058 | 5.95 |
#return_10years contains 17 columns which give information about 10 year return and ratios
return_10year = pd.read_csv('Hackathon_Files/external/return_10year.csv')
pandas_profiling.ProfileReport(return_10year)
Dataset info
| Number of variables | 17 |
|---|---|
| Number of observations | 25000 |
| Total Missing (%) | 12.3% |
| Total size in memory | 3.2 MiB |
| Average record size in memory | 136.0 B |
Variables types
| Numeric | 14 |
|---|---|
| Categorical | 1 |
| Boolean | 0 |
| Date | 0 |
| Text (Unique) | 1 |
| Rejected | 1 |
| Unsupported | 0 |
Warnings
10_years_alpha_category has 4900 / 19.6% zeros Zeros10_years_alpha_fund has 8584 / 34.3% missing values Missing10_years_beta_category has 3129 / 12.5% zeros Zeros10_years_beta_fund has 8584 / 34.3% missing values Missing10_years_return_category has 387 / 1.5% zeros Zeros10_years_return_fund has 8475 / 33.9% zeros Zeros10_years_return_mean_annual_category has 5774 / 23.1% zeros Zeros10_years_return_mean_annual_fund is highly correlated with 10_years_return_fund (ρ = 0.99243) Rejected10years_category_r_squared has 616 / 2.5% zeros Zeros10years_category_std has 386 / 1.5% zeros Zeros10years_fund_r_squared has 8584 / 34.3% missing values Missing10years_fund_std has 8584 / 34.3% missing values Missing10yrs_sharpe_ratio_category has 1112 / 4.4% zeros Zeros10yrs_sharpe_ratio_fund has 8584 / 34.3% missing values Missing10yrs_treynor_ratio_category has 414 / 1.7% zeros Zeros10yrs_treynor_ratio_fund has 8584 / 34.3% missing values Missing10yrs_treynor_ratio_fund has a high cardinality: 2753 distinct values Warning10_years_alpha_category
Numeric
| Distinct count | 18 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0011356 |
|---|---|
| Minimum | -0.11 |
| Maximum | 0.1 |
| Zeros (%) | 19.6% |
Quantile statistics
| Minimum | -0.11 |
|---|---|
| 5-th percentile | -0.03 |
| Q1 | -0.02 |
| Median | -0 |
| Q3 | 0.01 |
| 95-th percentile | 0.06 |
| Maximum | 0.1 |
| Range | 0.21 |
| Interquartile range | 0.03 |
Descriptive statistics
| Standard deviation | 0.027795 |
|---|---|
| Coef of variation | 24.476 |
| Kurtosis | 1.7129 |
| Mean | 0.0011356 |
| MAD | 0.02014 |
| Skewness | 0.91709 |
| Sum | 28.27 |
| Variance | 0.00077256 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.01 | 5346 | 21.4% |
|
| 0.0 | 4900 | 19.6% |
|
| -0.02 | 4605 | 18.4% |
|
| 0.01 | 2769 | 11.1% |
|
| 0.04 | 1946 | 7.8% |
|
| -0.03 | 1503 | 6.0% |
|
| 0.02 | 981 | 3.9% |
|
| 0.08 | 664 | 2.7% |
|
| -0.04 | 546 | 2.2% |
|
| 0.06 | 509 | 2.0% |
|
| Other values (7) | 1125 | 4.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.11 | 77 | 0.3% |
|
| -0.06 | 150 | 0.6% |
|
| -0.05 | 25 | 0.1% |
|
| -0.04 | 546 | 2.2% |
|
| -0.03 | 1503 | 6.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.05 | 193 | 0.8% |
|
| 0.06 | 509 | 2.0% |
|
| 0.07 | 428 | 1.7% |
|
| 0.08 | 664 | 2.7% |
|
| 0.1 | 50 | 0.2% |
|
10_years_alpha_fund
Numeric
| Distinct count | 1810 |
|---|---|
| Unique (%) | 7.2% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -0.0031475 |
|---|---|
| Minimum | -25.97 |
| Maximum | 14.86 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -25.97 |
|---|---|
| 5-th percentile | -4.3625 |
| Q1 | -1.74 |
| Median | -0.3 |
| Q3 | 1.28 |
| 95-th percentile | 6.66 |
| Maximum | 14.86 |
| Range | 40.83 |
| Interquartile range | 3.02 |
Descriptive statistics
| Standard deviation | 3.2756 |
|---|---|
| Coef of variation | -1040.7 |
| Kurtosis | 3.4222 |
| Mean | -0.0031475 |
| MAD | 2.299 |
| Skewness | 0.10365 |
| Sum | -51.67 |
| Variance | 10.729 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| -0.55 | 51 | 0.2% |
|
| -0.16 | 49 | 0.2% |
|
| -0.18 | 48 | 0.2% |
|
| -0.32 | 47 | 0.2% |
|
| -0.3 | 47 | 0.2% |
|
| -0.82 | 45 | 0.2% |
|
| -0.58 | 45 | 0.2% |
|
| -0.56 | 44 | 0.2% |
|
| -0.08 | 44 | 0.2% |
|
| -0.46 | 44 | 0.2% |
|
| Other values (1799) | 15952 | 63.8% |
|
| (Missing) | 8584 | 34.3% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -25.97 | 1 | 0.0% |
|
| -25.02 | 1 | 0.0% |
|
| -24.7 | 1 | 0.0% |
|
| -23.71 | 1 | 0.0% |
|
| -22.16 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 13.48 | 1 | 0.0% |
|
| 13.74 | 1 | 0.0% |
|
| 13.89 | 1 | 0.0% |
|
| 14.52 | 1 | 0.0% |
|
| 14.86 | 1 | 0.0% |
|
10_years_beta_category
Numeric
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0090757 |
|---|---|
| Minimum | -0.02 |
| Maximum | 0.12 |
| Zeros (%) | 12.5% |
Quantile statistics
| Minimum | -0.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.01 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.12 |
| Range | 0.14 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 0.0054928 |
|---|---|
| Coef of variation | 0.60522 |
| Kurtosis | 187.34 |
| Mean | 0.0090757 |
| MAD | 0.0024053 |
| Skewness | 8.64 |
| Sum | 225.93 |
| Variance | 3.017e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 21030 | 84.1% |
|
| 0.0 | 3129 | 12.5% |
|
| 0.02 | 629 | 2.5% |
|
| -0.02 | 53 | 0.2% |
|
| 0.12 | 28 | 0.1% |
|
| 0.03 | 25 | 0.1% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 3129 | 12.5% |
|
| 0.01 | 21030 | 84.1% |
|
| 0.02 | 629 | 2.5% |
|
| 0.03 | 25 | 0.1% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 3129 | 12.5% |
|
| 0.01 | 21030 | 84.1% |
|
| 0.02 | 629 | 2.5% |
|
| 0.03 | 25 | 0.1% |
|
| 0.12 | 28 | 0.1% |
|
10_years_beta_fund
Numeric
| Distinct count | 305 |
|---|---|
| Unique (%) | 1.2% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.96322 |
|---|---|
| Minimum | -88.06 |
| Maximum | 49.29 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -88.06 |
|---|---|
| 5-th percentile | 0.31 |
| Q1 | 0.86 |
| Median | 1.01 |
| Q3 | 1.13 |
| 95-th percentile | 1.42 |
| Maximum | 49.29 |
| Range | 137.35 |
| Interquartile range | 0.27 |
Descriptive statistics
| Standard deviation | 1.5826 |
|---|---|
| Coef of variation | 1.6431 |
| Kurtosis | 1624.7 |
| Mean | 0.96322 |
| MAD | 0.28399 |
| Skewness | -19.289 |
| Sum | 15812 |
| Variance | 2.5047 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 1.0 | 567 | 2.3% |
|
| 1.06 | 483 | 1.9% |
|
| 1.01 | 439 | 1.8% |
|
| 1.04 | 403 | 1.6% |
|
| 1.02 | 383 | 1.5% |
|
| 1.08 | 355 | 1.4% |
|
| 0.96 | 345 | 1.4% |
|
| 1.05 | 342 | 1.4% |
|
| 0.99 | 326 | 1.3% |
|
| 0.98 | 324 | 1.3% |
|
| Other values (294) | 12449 | 49.8% |
|
| (Missing) | 8584 | 34.3% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -88.06 | 1 | 0.0% |
|
| -87.88 | 1 | 0.0% |
|
| -48.87 | 1 | 0.0% |
|
| -48.76 | 1 | 0.0% |
|
| -48.75 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 38.24 | 2 | 0.0% |
|
| 38.25 | 1 | 0.0% |
|
| 49.18 | 1 | 0.0% |
|
| 49.21 | 1 | 0.0% |
|
| 49.29 | 1 | 0.0% |
|
10_years_return_category
Numeric
| Distinct count | 100 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 9.6793 |
|---|---|
| Minimum | -24.99 |
| Maximum | 18.72 |
| Zeros (%) | 1.5% |
Quantile statistics
| Minimum | -24.99 |
|---|---|
| 5-th percentile | 1.72 |
| Q1 | 6.44 |
| Median | 9.97 |
| Q3 | 14.12 |
| 95-th percentile | 15.94 |
| Maximum | 18.72 |
| Range | 43.71 |
| Interquartile range | 7.68 |
Descriptive statistics
| Standard deviation | 4.9264 |
|---|---|
| Coef of variation | 0.50896 |
| Kurtosis | 4.0661 |
| Mean | 9.6793 |
| MAD | 3.9918 |
| Skewness | -0.9968 |
| Sum | 240870 |
| Variance | 24.27 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 15.94 | 1333 | 5.3% |
|
| 14.54 | 1270 | 5.1% |
|
| 13.68 | 1121 | 4.5% |
|
| 4.56 | 957 | 3.8% |
|
| 11.79 | 938 | 3.8% |
|
| 8.92 | 757 | 3.0% |
|
| 9.98 | 708 | 2.8% |
|
| 14.67 | 683 | 2.7% |
|
| 8.43 | 680 | 2.7% |
|
| 16.24 | 669 | 2.7% |
|
| Other values (89) | 15769 | 63.1% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -24.99 | 53 | 0.2% |
|
| -3.06 | 109 | 0.4% |
|
| -2.95 | 57 | 0.2% |
|
| -2.76 | 114 | 0.5% |
|
| 0.0 | 387 | 1.5% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 16.24 | 669 | 2.7% |
|
| 17.07 | 101 | 0.4% |
|
| 17.16 | 27 | 0.1% |
|
| 17.24 | 225 | 0.9% |
|
| 18.72 | 160 | 0.6% |
|
10_years_return_fund
Numeric
| Distinct count | 2275 |
|---|---|
| Unique (%) | 9.1% |
| Missing (%) | 0.5% |
| Missing (n) | 115 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 6.621 |
|---|---|
| Minimum | -38.56 |
| Maximum | 40.66 |
| Zeros (%) | 33.9% |
Quantile statistics
| Minimum | -38.56 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 5.9 |
| Q3 | 12.38 |
| 95-th percentile | 16.66 |
| Maximum | 40.66 |
| Range | 79.22 |
| Interquartile range | 12.38 |
Descriptive statistics
| Standard deviation | 6.5374 |
|---|---|
| Coef of variation | 0.98738 |
| Kurtosis | 0.78071 |
| Mean | 6.621 |
| MAD | 5.7164 |
| Skewness | -0.018269 |
| Sum | 164760 |
| Variance | 42.738 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.0 | 8475 | 33.9% |
|
| 15.14 | 24 | 0.1% |
|
| 15.42 | 24 | 0.1% |
|
| 9.24 | 23 | 0.1% |
|
| 13.11 | 21 | 0.1% |
|
| 13.26 | 21 | 0.1% |
|
| 13.63 | 20 | 0.1% |
|
| 14.05 | 20 | 0.1% |
|
| 12.41 | 20 | 0.1% |
|
| 10.15 | 20 | 0.1% |
|
| Other values (2264) | 16217 | 64.9% |
|
| (Missing) | 115 | 0.5% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -38.56 | 1 | 0.0% |
|
| -38.21 | 1 | 0.0% |
|
| -37.94 | 1 | 0.0% |
|
| -37.78 | 1 | 0.0% |
|
| -37.77 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 36.45 | 1 | 0.0% |
|
| 36.86 | 1 | 0.0% |
|
| 37.81 | 1 | 0.0% |
|
| 37.92 | 2 | 0.0% |
|
| 40.66 | 1 | 0.0% |
|
10_years_return_mean_annual_category
Numeric
| Distinct count | 5 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.007681 |
|---|---|
| Minimum | -0.02 |
| Maximum | 0.02 |
| Zeros (%) | 23.1% |
Quantile statistics
| Minimum | -0.02 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.01 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.02 |
| Range | 0.04 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 0.004514 |
|---|---|
| Coef of variation | 0.58769 |
| Kurtosis | 2.3657 |
| Mean | 0.007681 |
| MAD | 0.003681 |
| Skewness | -1.4002 |
| Sum | 191.21 |
| Variance | 2.0376e-05 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 18907 | 75.6% |
|
| 0.0 | 5774 | 23.1% |
|
| 0.02 | 160 | 0.6% |
|
| -0.02 | 53 | 0.2% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 5774 | 23.1% |
|
| 0.01 | 18907 | 75.6% |
|
| 0.02 | 160 | 0.6% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.02 | 53 | 0.2% |
|
| 0.0 | 5774 | 23.1% |
|
| 0.01 | 18907 | 75.6% |
|
| 0.02 | 160 | 0.6% |
|
10_years_return_mean_annual_fund
Highly correlated
This variable is highly correlated with 10_years_return_fund and should be ignored for analysis
| Correlation | 0.99243 |
|---|
10years_category_r_squared
Numeric
| Distinct count | 53 |
|---|---|
| Unique (%) | 0.2% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.73158 |
|---|---|
| Minimum | 0 |
| Maximum | 0.97 |
| Zeros (%) | 2.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.03 |
| Q1 | 0.71 |
| Median | 0.84 |
| Q3 | 0.92 |
| 95-th percentile | 0.95 |
| Maximum | 0.97 |
| Range | 0.97 |
| Interquartile range | 0.21 |
Descriptive statistics
| Standard deviation | 0.27368 |
|---|---|
| Coef of variation | 0.3741 |
| Kurtosis | 1.3046 |
| Mean | 0.73158 |
| MAD | 0.20389 |
| Skewness | -1.5742 |
| Sum | 18212 |
| Variance | 0.074903 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.94 | 2320 | 9.3% |
|
| 0.88 | 1485 | 5.9% |
|
| 0.82 | 1448 | 5.8% |
|
| 0.91 | 1333 | 5.3% |
|
| 0.73 | 1314 | 5.3% |
|
| 0.75 | 1277 | 5.1% |
|
| 0.9 | 1173 | 4.7% |
|
| 0.92 | 1105 | 4.4% |
|
| 0.93 | 1037 | 4.1% |
|
| 0.84 | 985 | 3.9% |
|
| Other values (42) | 11417 | 45.7% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 616 | 2.5% |
|
| 0.01 | 193 | 0.8% |
|
| 0.03 | 664 | 2.7% |
|
| 0.06 | 78 | 0.3% |
|
| 0.07 | 302 | 1.2% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.93 | 1037 | 4.1% |
|
| 0.94 | 2320 | 9.3% |
|
| 0.95 | 938 | 3.8% |
|
| 0.96 | 183 | 0.7% |
|
| 0.97 | 875 | 3.5% |
|
10years_category_std
Numeric
| Distinct count | 27 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.1094 |
|---|---|
| Minimum | 0 |
| Maximum | 0.34 |
| Zeros (%) | 1.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.02 |
| Q1 | 0.06 |
| Median | 0.12 |
| Q3 | 0.15 |
| 95-th percentile | 0.18 |
| Maximum | 0.34 |
| Range | 0.34 |
| Interquartile range | 0.09 |
Descriptive statistics
| Standard deviation | 0.055202 |
|---|---|
| Coef of variation | 0.5046 |
| Kurtosis | -0.38714 |
| Mean | 0.1094 |
| MAD | 0.047445 |
| Skewness | -0.017667 |
| Sum | 2723.4 |
| Variance | 0.0030473 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.13 | 2980 | 11.9% |
|
| 0.16 | 2473 | 9.9% |
|
| 0.14 | 2237 | 8.9% |
|
| 0.07 | 1738 | 7.0% |
|
| 0.04 | 1724 | 6.9% |
|
| 0.18 | 1623 | 6.5% |
|
| 0.03 | 1338 | 5.4% |
|
| 0.1 | 1281 | 5.1% |
|
| 0.05 | 1279 | 5.1% |
|
| 0.09 | 1074 | 4.3% |
|
| Other values (16) | 7147 | 28.6% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 386 | 1.5% |
|
| 0.01 | 449 | 1.8% |
|
| 0.02 | 550 | 2.2% |
|
| 0.03 | 1338 | 5.4% |
|
| 0.04 | 1724 | 6.9% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.22 | 2 | 0.0% |
|
| 0.24 | 53 | 0.2% |
|
| 0.25 | 15 | 0.1% |
|
| 0.26 | 92 | 0.4% |
|
| 0.34 | 57 | 0.2% |
|
10years_fund_r_squared
Numeric
| Distinct count | 5185 |
|---|---|
| Unique (%) | 20.7% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 76.603 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5.8875 |
| Q1 | 72.52 |
| Median | 86.08 |
| Q3 | 93.77 |
| 95-th percentile | 97.51 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range | 21.25 |
Descriptive statistics
| Standard deviation | 25.699 |
|---|---|
| Coef of variation | 0.33549 |
| Kurtosis | 2.1707 |
| Mean | 76.603 |
| MAD | 18.66 |
| Skewness | -1.7549 |
| Sum | 1257500 |
| Variance | 660.45 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 99.99 | 58 | 0.2% |
|
| 100.0 | 43 | 0.2% |
|
| 0.0 | 31 | 0.1% |
|
| 0.02 | 28 | 0.1% |
|
| 96.98 | 26 | 0.1% |
|
| 0.01 | 21 | 0.1% |
|
| 93.74 | 20 | 0.1% |
|
| 96.13 | 20 | 0.1% |
|
| 95.31 | 20 | 0.1% |
|
| 95.68 | 19 | 0.1% |
|
| Other values (5174) | 16130 | 64.5% |
|
| (Missing) | 8584 | 34.3% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.0 | 31 | 0.1% |
|
| 0.01 | 21 | 0.1% |
|
| 0.02 | 28 | 0.1% |
|
| 0.03 | 6 | 0.0% |
|
| 0.04 | 7 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 99.92 | 3 | 0.0% |
|
| 99.97 | 5 | 0.0% |
|
| 99.98 | 8 | 0.0% |
|
| 99.99 | 58 | 0.2% |
|
| 100.0 | 43 | 0.2% |
|
10years_fund_std
Numeric
| Distinct count | 2255 |
|---|---|
| Unique (%) | 9.0% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 11.419 |
|---|---|
| Minimum | 0.2 |
| Maximum | 52.29 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 0.2 |
|---|---|
| 5-th percentile | 2.46 |
| Q1 | 6.14 |
| Median | 12.74 |
| Q3 | 15.62 |
| 95-th percentile | 19.1 |
| Maximum | 52.29 |
| Range | 52.09 |
| Interquartile range | 9.48 |
Descriptive statistics
| Standard deviation | 5.9371 |
|---|---|
| Coef of variation | 0.51995 |
| Kurtosis | 1.0757 |
| Mean | 11.419 |
| MAD | 4.9435 |
| Skewness | 0.31377 |
| Sum | 187450 |
| Variance | 35.25 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 12.69 | 42 | 0.2% |
|
| 12.68 | 41 | 0.2% |
|
| 3.36 | 39 | 0.2% |
|
| 3.18 | 35 | 0.1% |
|
| 15.22 | 34 | 0.1% |
|
| 12.7 | 33 | 0.1% |
|
| 14.07 | 31 | 0.1% |
|
| 13.02 | 30 | 0.1% |
|
| 13.87 | 29 | 0.1% |
|
| 14.08 | 29 | 0.1% |
|
| Other values (2244) | 16073 | 64.3% |
|
| (Missing) | 8584 | 34.3% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0.2 | 1 | 0.0% |
|
| 0.22 | 2 | 0.0% |
|
| 0.25 | 1 | 0.0% |
|
| 0.27 | 2 | 0.0% |
|
| 0.3 | 2 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 50.17 | 1 | 0.0% |
|
| 51.58 | 1 | 0.0% |
|
| 51.63 | 1 | 0.0% |
|
| 52.18 | 1 | 0.0% |
|
| 52.29 | 1 | 0.0% |
|
10yrs_sharpe_ratio_category
Numeric
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.0% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.0095107 |
|---|---|
| Minimum | -0.01 |
| Maximum | 0.01 |
| Zeros (%) | 4.4% |
Quantile statistics
| Minimum | -0.01 |
|---|---|
| 5-th percentile | 0.01 |
| Q1 | 0.01 |
| Median | 0.01 |
| Q3 | 0.01 |
| 95-th percentile | 0.01 |
| Maximum | 0.01 |
| Range | 0.02 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 0.0022537 |
|---|---|
| Coef of variation | 0.23697 |
| Kurtosis | 23.133 |
| Mean | 0.0095107 |
| MAD | 0.00093275 |
| Skewness | -4.729 |
| Sum | 236.76 |
| Variance | 5.0794e-06 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.01 | 23729 | 94.9% |
|
| 0.0 | 1112 | 4.4% |
|
| -0.01 | 53 | 0.2% |
|
| (Missing) | 106 | 0.4% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 53 | 0.2% |
|
| 0.0 | 1112 | 4.4% |
|
| 0.01 | 23729 | 94.9% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| -0.01 | 53 | 0.2% |
|
| 0.0 | 1112 | 4.4% |
|
| 0.01 | 23729 | 94.9% |
|
10yrs_sharpe_ratio_fund
Numeric
| Distinct count | 322 |
|---|---|
| Unique (%) | 1.3% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.93749 |
|---|---|
| Minimum | -6.58 |
| Maximum | 3.01 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | -6.58 |
|---|---|
| 5-th percentile | 0.43 |
| Q1 | 0.8 |
| Median | 0.96 |
| Q3 | 1.12 |
| 95-th percentile | 1.41 |
| Maximum | 3.01 |
| Range | 9.59 |
| Interquartile range | 0.32 |
Descriptive statistics
| Standard deviation | 0.34227 |
|---|---|
| Coef of variation | 0.36509 |
| Kurtosis | 23.314 |
| Mean | 0.93749 |
| MAD | 0.22893 |
| Skewness | -2.1068 |
| Sum | 15390 |
| Variance | 0.11715 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.98 | 372 | 1.5% |
|
| 0.96 | 365 | 1.5% |
|
| 0.92 | 360 | 1.4% |
|
| 0.94 | 350 | 1.4% |
|
| 0.97 | 307 | 1.2% |
|
| 0.9 | 307 | 1.2% |
|
| 0.91 | 300 | 1.2% |
|
| 1.04 | 300 | 1.2% |
|
| 1.0 | 299 | 1.2% |
|
| 0.88 | 294 | 1.2% |
|
| Other values (311) | 13162 | 52.6% |
|
| (Missing) | 8584 | 34.3% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -6.58 | 1 | 0.0% |
|
| -2.01 | 1 | 0.0% |
|
| -1.88 | 1 | 0.0% |
|
| -1.85 | 1 | 0.0% |
|
| -1.76 | 1 | 0.0% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 2.44 | 1 | 0.0% |
|
| 2.46 | 1 | 0.0% |
|
| 2.78 | 1 | 0.0% |
|
| 2.89 | 1 | 0.0% |
|
| 3.01 | 1 | 0.0% |
|
10yrs_treynor_ratio_category
Numeric
| Distinct count | 30 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.4% |
| Missing (n) | 106 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 0.13884 |
|---|---|
| Minimum | -0.19 |
| Maximum | 4.68 |
| Zeros (%) | 1.7% |
Quantile statistics
| Minimum | -0.19 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.07 |
| Median | 0.1 |
| Q3 | 0.14 |
| 95-th percentile | 0.21 |
| Maximum | 4.68 |
| Range | 4.87 |
| Interquartile range | 0.07 |
Descriptive statistics
| Standard deviation | 0.44255 |
|---|---|
| Coef of variation | 3.1875 |
| Kurtosis | 99.465 |
| Mean | 0.13884 |
| MAD | 0.09513 |
| Skewness | 9.9758 |
| Sum | 3456.2 |
| Variance | 0.19585 |
| Memory size | 195.4 KiB |
| Value | Count | Frequency (%) | |
| 0.14 | 4428 | 17.7% |
|
| 0.08 | 3807 | 15.2% |
|
| 0.04 | 2092 | 8.4% |
|
| 0.09 | 1643 | 6.6% |
|
| 0.15 | 1445 | 5.8% |
|
| 0.1 | 1368 | 5.5% |
|
| 0.13 | 1349 | 5.4% |
|
| 0.05 | 1341 | 5.4% |
|
| 0.12 | 1247 | 5.0% |
|
| 0.11 | 967 | 3.9% |
|
| Other values (19) | 5207 | 20.8% |
|
Minimum 5 values
| Value | Count | Frequency (%) | |
| -0.19 | 51 | 0.2% |
|
| -0.14 | 193 | 0.8% |
|
| -0.1 | 142 | 0.6% |
|
| -0.05 | 278 | 1.1% |
|
| -0.02 | 323 | 1.3% |
|
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0.19 | 226 | 0.9% |
|
| 0.21 | 372 | 1.5% |
|
| 0.23 | 677 | 2.7% |
|
| 0.3 | 50 | 0.2% |
|
| 4.68 | 230 | 0.9% |
|
10yrs_treynor_ratio_fund
Categorical
| Distinct count | 2753 |
|---|---|
| Unique (%) | 11.0% |
| Missing (%) | 34.3% |
| Missing (n) | 8584 |
| 7.7 |
|
|---|---|
| 14.92 |
|
| 7.42 |
|
| Other values (2749) |
16337
|
| (Missing) |
8584
|
| Value | Count | Frequency (%) | |
| 7.7 | 30 | 0.1% |
|
| 14.92 | 26 | 0.1% |
|
| 7.42 | 23 | 0.1% |
|
| 7.6 | 22 | 0.1% |
|
| 14.24 | 22 | 0.1% |
|
| 13.42 | 22 | 0.1% |
|
| 8.48 | 21 | 0.1% |
|
| 12.02 | 21 | 0.1% |
|
| 14.52 | 21 | 0.1% |
|
| 12.37 | 21 | 0.1% |
|
| Other values (2742) | 16187 | 64.7% |
|
| (Missing) | 8584 | 34.3% |
|
fund_id
Categorical, Unique
| First 3 values |
|---|
| e7dff334-3313-4348-917a-64c631da08f1 |
| abf7f06e-6d96-4016-a9c8-2c7975ecf778 |
| 0edb76db-aca6-4b0f-8e4e-772674e188fa |
| Last 3 values |
|---|
| 5c653690-cbea-4370-908e-582b0c74cc2d |
| c97e052e-0f2d-42bb-bacd-f58e116d4c85 |
| 819f40d9-f07d-480d-9be8-045999bbb7f5 |
First 10 values
| Value | Count | Frequency (%) | |
| 0002e898-709a-4b80-8f5c-ec846feff26c | 1 | 0.0% |
|
| 00070160-01a2-4ad3-9290-958a110c8e9f | 1 | 0.0% |
|
| 0009d9da-6735-46c1-81cd-dbc62c53c2e2 | 1 | 0.0% |
|
| 000ad9cc-3f7e-48f3-a1f1-4f5c03d3eb6d | 1 | 0.0% |
|
| 000b6091-3c16-41a1-9df4-fce73767dd21 | 1 | 0.0% |
|
Last 10 values
| Value | Count | Frequency (%) | |
| fff6de73-cbbd-4814-a59a-f0210d669eae | 1 | 0.0% |
|
| fff75f2a-1419-4d65-a68f-89d601d47350 | 1 | 0.0% |
|
| fff79179-2ca5-4f26-a023-929c255aeda4 | 1 | 0.0% |
|
| fffb0e0f-2dc9-4e86-b534-476f9669720b | 1 | 0.0% |
|
| fffe9b65-2288-4d99-844e-89e7747aa323 | 1 | 0.0% |
|
| 10years_category_r_squared | 10yrs_sharpe_ratio_fund | 10_years_alpha_fund | 10years_fund_r_squared | 10years_fund_std | 10yrs_sharpe_ratio_category | 10_years_beta_fund | 10yrs_treynor_ratio_fund | fund_id | 10_years_return_mean_annual_category | 10yrs_treynor_ratio_category | 10_years_return_fund | 10_years_alpha_category | 10_years_beta_category | 10years_category_std | 10_years_return_mean_annual_fund | 10_years_return_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.49 | NaN | NaN | NaN | NaN | 0.01 | NaN | NaN | 264614c6-5ac3-4146-ba26-1674b136cb40 | 0.01 | 0.21 | 0.00 | 0.06 | 0.01 | 0.13 | NaN | 14.30 |
| 1 | 0.88 | 1.16 | 0.16 | 91.68 | 14.30 | 0.01 | 1.08 | 15.57 | f5ad58c2-fdea-4087-8678-e04744f89f90 | 0.01 | 0.15 | 17.25 | -0.01 | 0.01 | 0.14 | 1.42 | 15.94 |
| 2 | 0.88 | 1.22 | 1.00 | 90.69 | 12.68 | 0.01 | 0.95 | 16.58 | 3c13f4ab-02c4-4ca7-a133-7e996ec5d0c4 | 0.01 | 0.15 | 16.21 | -0.01 | 0.01 | 0.14 | 1.33 | 15.94 |
| 3 | 0.90 | 1.20 | 0.75 | 89.03 | 11.21 | 0.01 | 0.84 | 16.38 | ff78bdd8-59eb-4cef-9f3c-b1baacce9554 | 0.01 | 0.14 | 14.12 | -0.02 | 0.01 | 0.13 | 1.16 | 13.68 |
| 4 | 0.97 | NaN | NaN | NaN | NaN | 0.01 | NaN | NaN | 63d8406d-c525-494a-8e03-d4fc4cfcb571 | 0.01 | 0.08 | 0.00 | -0.02 | 0.01 | 0.12 | NaN | 11.53 |
default 1 - yes 0 - no
account_check_status: (qualitative)
Status of existing checking account
A11 : ... < 0 DM (DM - Deutsch Mark)
A12 : 0 <= ... < 200 DM
A13 : ... >= 200 DM /
salary assignments for at least 1 year
A14 : no checking account
duration_in_month: (numerical)
Duration in month
credit_history: (qualitative)
Credit history
A30 : no credits taken/
all credits paid back duly
A31 : all credits at this bank paid back duly
A32 : existing credits paid back duly till now
A33 : delay in paying off in the past
A34 : critical account/
other credits existing (not at this bank)
purpose: (qualitative)
Purpose
A40 : car (new)
A41 : car (used)
A42 : furniture/equipment
A43 : radio/television
A44 : domestic appliances
A45 : repairs
A46 : education
A47 : (vacation - does not exist?)
A48 : retraining
A49 : business
A410 : others
credit_amount: (numerical)
Credit amount
savings: (qualitative)
Savings account/bonds
A61 : ... < 100 DM
A62 : 100 <= ... < 500 DM
A63 : 500 <= ... < 1000 DM
A64 : .. >= 1000 DM
A65 : unknown/ no savings account
present_emp_since: (qualitative)
Present employment since
A71 : unemployed
A72 : ... < 1 year
A73 : 1 <= ... < 4 years
A74 : 4 <= ... < 7 years
A75 : .. >= 7 years
installment_as_income_perc: (numerical)
Installment rate in percentage of disposable income
personal_status_sex: (qualitative)
Personal status and sex
A91 : male : divorced/separated
A92 : female : divorced/separated/married
A93 : male : single
A94 : male : married/widowed
A95 : female : single
present_res_since: (numerical)
Present residence since
property: (qualitative)
Property
A121 : real estate
A122 : if not A121 : building society savings agreement/
life insurance
A123 : if not A121/A122 : car or other, not in attribute 6
A124 : unknown / no property
age: (numerical)
Age in years
other_installment_plans: (qualitative)
Other installment plans
A141 : bank
A142 : stores
A143 : none
housing: (qualitative)
Housing
A151 : rent
A152 : own
A153 : for free
credits_this_bank : (numerical)
Number of existing credits at this bank
job : (qualitative)
Job
A171 : unemployed/ unskilled - non-resident
A172 : unskilled - resident
A173 : skilled employee / official
A174 : management/ self-employed/
highly qualified employee/ officer
people_under_maintenance: (numerical)
Number of people being liable to provide maintenance for
telephone: (qualitative)
Telephone
A191 : none
A192 : yes, registered under the customers name
foreign_worker: (qualitative)
foreign worker
A201 : yes
A202 : no
class color:
PURPLE = '\033[95m'
CYAN = '\033[96m'
DARKCYAN = '\033[36m'
BLUE = '\033[94m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
END = '\033[0m'
print("Size of dataframe is " +color.BOLD+ format(origCreditDf.size) + color.END)
print("Shape(#rows,#columns) of dataframe is "+color.BOLD+ format(origCreditDf.shape) + color.END)
print("Dataframe information \n")
print(origCreditDf.info())
Size of dataframe is 21000 Shape(#rows,#columns) of dataframe is (1000, 21) Dataframe information <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 21 columns): default 1000 non-null int64 account_check_status 1000 non-null object duration_in_month 1000 non-null int64 credit_history 1000 non-null object purpose 1000 non-null object credit_amount 1000 non-null int64 savings 1000 non-null object present_emp_since 1000 non-null object installment_as_income_perc 1000 non-null int64 personal_status_sex 1000 non-null object other_debtors 1000 non-null object present_res_since 1000 non-null int64 property 1000 non-null object age 1000 non-null int64 other_installment_plans 1000 non-null object housing 1000 non-null object credits_this_bank 1000 non-null int64 job 1000 non-null object people_under_maintenance 1000 non-null int64 telephone 1000 non-null object foreign_worker 1000 non-null object dtypes: int64(8), object(13) memory usage: 164.1+ KB None
#checking for missing values
origCreditDf.isnull().sum()
default 0 account_check_status 0 duration_in_month 0 credit_history 0 purpose 0 credit_amount 0 savings 0 present_emp_since 0 installment_as_income_perc 0 personal_status_sex 0 other_debtors 0 present_res_since 0 property 0 age 0 other_installment_plans 0 housing 0 credits_this_bank 0 job 0 people_under_maintenance 0 telephone 0 foreign_worker 0 dtype: int64
## Dataset has no missing values. 5 point summary of numerical attributes
origCreditDf.describe().transpose()
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| default | 1000.0 | 0.300 | 0.458487 | 0.0 | 0.0 | 0.0 | 1.00 | 1.0 |
| duration_in_month | 1000.0 | 20.903 | 12.058814 | 4.0 | 12.0 | 18.0 | 24.00 | 72.0 |
| credit_amount | 1000.0 | 3271.258 | 2822.736876 | 250.0 | 1365.5 | 2319.5 | 3972.25 | 18424.0 |
| installment_as_income_perc | 1000.0 | 2.973 | 1.118715 | 1.0 | 2.0 | 3.0 | 4.00 | 4.0 |
| present_res_since | 1000.0 | 2.845 | 1.103718 | 1.0 | 2.0 | 3.0 | 4.00 | 4.0 |
| age | 1000.0 | 35.546 | 11.375469 | 19.0 | 27.0 | 33.0 | 42.00 | 75.0 |
| credits_this_bank | 1000.0 | 1.407 | 0.577654 | 1.0 | 1.0 | 1.0 | 2.00 | 4.0 |
| people_under_maintenance | 1000.0 | 1.155 | 0.362086 | 1.0 | 1.0 | 1.0 | 1.00 | 2.0 |
obj_origCreditDf=origCreditDf.select_dtypes(include=['object']).copy()
obj_origCreditDf.head(5)
print('defaulters :',origCreditDf['default'].unique())
# Number of 'good' credits (should be 700) and 'bad credits (should be 300)
origCreditDf['default'].value_counts()
defaulters : [0 1]
0 700 1 300 Name: default, dtype: int64
print("Shape(#rows,#columns) of dataframe is "+color.BOLD+ format(obj_origCreditDf.shape) + color.END)
print(obj_origCreditDf.info())
print(obj_origCreditDf.columns)
Shape(#rows,#columns) of dataframe is (1000, 13)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
account_check_status 1000 non-null object
credit_history 1000 non-null object
purpose 1000 non-null object
savings 1000 non-null object
present_emp_since 1000 non-null object
personal_status_sex 1000 non-null object
other_debtors 1000 non-null object
property 1000 non-null object
other_installment_plans 1000 non-null object
housing 1000 non-null object
job 1000 non-null object
telephone 1000 non-null object
foreign_worker 1000 non-null object
dtypes: object(13)
memory usage: 101.6+ KB
None
Index(['account_check_status', 'credit_history', 'purpose', 'savings',
'present_emp_since', 'personal_status_sex', 'other_debtors', 'property',
'other_installment_plans', 'housing', 'job', 'telephone',
'foreign_worker'],
dtype='object')
#Let's see possible values of categrical variables in data
print('account_check_status :',obj_origCreditDf['account_check_status'].unique())
print('credit_history :',obj_origCreditDf['credit_history'].unique())
print('purpose :',obj_origCreditDf['purpose'].unique())
print('savings :',obj_origCreditDf['savings'].unique())
print('present_emp_since :',obj_origCreditDf['present_emp_since'].unique())
print('personal_status_sex :',obj_origCreditDf['personal_status_sex'].unique())
print('other_debtors :',obj_origCreditDf['other_debtors'].unique())
print('property :',obj_origCreditDf['property'].unique())
print('other_installment_plans :',obj_origCreditDf['other_installment_plans'].unique())
print('housing :',obj_origCreditDf['housing'].unique())
print('job :',obj_origCreditDf['job'].unique())
print('telephone :',obj_origCreditDf['telephone'].unique())
print('foreign_worker :',obj_origCreditDf['foreign_worker'].unique())
account_check_status : ['< 0 DM' '0 <= ... < 200 DM' 'no checking account' '>= 200 DM / salary assignments for at least 1 year'] credit_history : ['critical account/ other credits existing (not at this bank)' 'existing credits paid back duly till now' 'delay in paying off in the past' 'no credits taken/ all credits paid back duly' 'all credits at this bank paid back duly'] purpose : ['domestic appliances' '(vacation - does not exist?)' 'radio/television' 'car (new)' 'car (used)' 'business' 'repairs' 'education' 'furniture/equipment' 'retraining'] savings : ['unknown/ no savings account' '... < 100 DM' '500 <= ... < 1000 DM ' '.. >= 1000 DM ' '100 <= ... < 500 DM'] present_emp_since : ['.. >= 7 years' '1 <= ... < 4 years' '4 <= ... < 7 years' 'unemployed' '... < 1 year '] personal_status_sex : ['male : single' 'female : divorced/separated/married' 'male : divorced/separated' 'male : married/widowed'] other_debtors : ['none' 'guarantor' 'co-applicant'] property : ['real estate' 'if not A121 : building society savings agreement/ life insurance' 'unknown / no property' 'if not A121/A122 : car or other, not in attribute 6'] other_installment_plans : ['none' 'bank' 'stores'] housing : ['own' 'for free' 'rent'] job : ['skilled employee / official' 'unskilled - resident' 'management/ self-employed/ highly qualified employee/ officer' 'unemployed/ unskilled - non-resident'] telephone : ['yes, registered under the customers name ' 'none'] foreign_worker : ['yes' 'no']
#Lets check correlation among columns of dataframe.
from dython.nominal import associations
corr_df=associations(origCreditDf, nominal_columns=['default','account_check_status', 'credit_history', 'purpose', 'savings',
'present_emp_since', 'personal_status_sex', 'other_debtors', 'property',
'other_installment_plans', 'housing', 'job', 'telephone',
'foreign_worker'], mark_columns=True, theil_u=True, plot=True, return_results=True)
corr_df
| default (nom) | account_check_status (nom) | duration_in_month (con) | credit_history (nom) | purpose (nom) | credit_amount (con) | savings (nom) | present_emp_since (nom) | installment_as_income_perc (con) | personal_status_sex (nom) | ... | present_res_since (con) | property (nom) | age (con) | other_installment_plans (nom) | housing (nom) | credits_this_bank (con) | job (nom) | people_under_maintenance (con) | telephone (nom) | foreign_worker (nom) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| default (nom) | 1.000000 | 0.107500 | 0.214927 | 0.049493 | 0.028247 | 0.154739 | 0.031902 | 0.014867 | 0.072404 | 0.007728 | ... | 0.002967 | 0.019273 | 0.091127 | 0.010071 | 0.014471 | 0.045732 | 0.001517 | 0.003015 | 0.001093 | 0.006607 |
| account_check_status (nom) | 0.052573 | 1.000000 | 0.118855 | 0.024831 | 0.027883 | 0.145556 | 0.037330 | 0.011421 | 0.074606 | 0.005207 | ... | 0.108725 | 0.006893 | 0.090730 | 0.001695 | 0.007726 | 0.097804 | 0.006513 | 0.076944 | 0.002639 | 0.002581 |
| duration_in_month (con) | 0.214927 | 0.118855 | 1.000000 | 0.194654 | 0.273692 | 0.624984 | 0.105586 | 0.093996 | 0.074749 | 0.133419 | ... | 0.034067 | 0.304274 | -0.036136 | 0.077902 | 0.192174 | -0.011284 | 0.218688 | -0.023834 | 0.164718 | 0.138196 |
| credit_history (nom) | 0.025480 | 0.026139 | 0.194654 | 1.000000 | 0.039900 | 0.193283 | 0.009393 | 0.017312 | 0.072874 | 0.011302 | ... | 0.098787 | 0.007824 | 0.176836 | 0.030006 | 0.007789 | 0.595094 | 0.005593 | 0.097687 | 0.002144 | 0.003431 |
| purpose (nom) | 0.009335 | 0.018842 | 0.273692 | 0.025614 | 1.000000 | 0.370954 | 0.014289 | 0.015998 | 0.182953 | 0.018441 | ... | 0.151836 | 0.033483 | 0.171765 | 0.010225 | 0.022393 | 0.146968 | 0.029645 | 0.163750 | 0.013384 | 0.007179 |
| credit_amount (con) | 0.154739 | 0.145556 | 0.624984 | 0.193283 | 0.370954 | 1.000000 | 0.129507 | 0.111905 | -0.271316 | 0.187014 | ... | 0.028926 | 0.318339 | 0.032716 | 0.048336 | 0.201812 | 0.020795 | 0.334607 | 0.017142 | 0.276995 | 0.050050 |
| savings (nom) | 0.016658 | 0.039858 | 0.105586 | 0.009528 | 0.022578 | 0.129507 | 1.000000 | 0.013570 | 0.046553 | 0.004986 | ... | 0.099015 | 0.007897 | 0.112603 | 0.000398 | 0.001975 | 0.074588 | 0.006799 | 0.033914 | 0.003656 | 0.000748 |
| present_emp_since (nom) | 0.006079 | 0.009549 | 0.093996 | 0.013751 | 0.019794 | 0.111905 | 0.010627 | 1.000000 | 0.140501 | 0.028665 | ... | 0.325431 | 0.020539 | 0.409607 | 0.003227 | 0.019023 | 0.154743 | 0.062587 | 0.097989 | 0.007580 | 0.003070 |
| installment_as_income_perc (con) | 0.072404 | 0.074606 | 0.074749 | 0.072874 | 0.182953 | -0.271316 | 0.046553 | 0.140501 | 1.000000 | 0.143033 | ... | 0.049302 | 0.055589 | 0.058266 | 0.057177 | 0.094890 | 0.021669 | 0.111352 | -0.071207 | 0.014413 | 0.090024 |
| personal_status_sex (nom) | 0.004445 | 0.006125 | 0.133419 | 0.012628 | 0.032098 | 0.187014 | 0.005493 | 0.040323 | 0.143033 | 1.000000 | ... | 0.113764 | 0.022221 | 0.245809 | 0.003382 | 0.040270 | 0.118680 | 0.009094 | 0.284250 | 0.003767 | 0.002179 |
| other_debtors (nom) | 0.008909 | 0.033111 | 0.048387 | 0.029159 | 0.061568 | 0.100164 | 0.036189 | 0.020880 | 0.014840 | 0.006117 | ... | 0.028335 | 0.056189 | 0.030888 | 0.008428 | 0.011689 | 0.025712 | 0.022499 | 0.048008 | 0.008195 | 0.013679 |
| present_res_since (con) | 0.002967 | 0.108725 | 0.034067 | 0.098787 | 0.151836 | 0.028926 | 0.099015 | 0.325431 | 0.049302 | 0.113764 | ... | 1.000000 | 0.191575 | 0.266419 | 0.055319 | 0.307190 | 0.089625 | 0.035411 | 0.042643 | 0.095359 | 0.054097 |
| property (nom) | 0.008720 | 0.006377 | 0.304274 | 0.006876 | 0.045841 | 0.318339 | 0.006843 | 0.022727 | 0.055589 | 0.017479 | ... | 0.191575 | 1.000000 | 0.224743 | 0.004725 | 0.166006 | 0.018524 | 0.041165 | 0.094770 | 0.014674 | 0.008211 |
| age (con) | 0.091127 | 0.090730 | -0.036136 | 0.176836 | 0.171765 | 0.032716 | 0.112603 | 0.409607 | 0.058266 | 0.245809 | ... | 0.266419 | 0.224743 | 1.000000 | 0.047069 | 0.307002 | 0.149254 | 0.164476 | 0.118201 | 0.145259 | 0.006151 |
| other_installment_plans (nom) | 0.010507 | 0.003616 | 0.077902 | 0.060810 | 0.032278 | 0.048336 | 0.000796 | 0.008232 | 0.057177 | 0.006134 | ... | 0.055319 | 0.010896 | 0.047069 | 1.000000 | 0.016603 | 0.050290 | 0.008896 | 0.077224 | 0.000864 | 0.003132 |
| housing (nom) | 0.011197 | 0.012223 | 0.192174 | 0.011707 | 0.052427 | 0.201812 | 0.002926 | 0.035994 | 0.094890 | 0.054167 | ... | 0.307190 | 0.283880 | 0.307002 | 0.012313 | 1.000000 | 0.058105 | 0.019500 | 0.126136 | 0.008705 | 0.005728 |
| credits_this_bank (con) | 0.045732 | 0.097804 | -0.011284 | 0.595094 | 0.146968 | 0.020795 | 0.074588 | 0.154743 | 0.021669 | 0.118680 | ... | 0.089625 | 0.018524 | 0.149254 | 0.050290 | 0.058105 | 1.000000 | 0.060502 | 0.109667 | 0.065553 | 0.009717 |
| job (nom) | 0.000946 | 0.008303 | 0.218688 | 0.006774 | 0.055931 | 0.334607 | 0.008119 | 0.095435 | 0.111352 | 0.009858 | ... | 0.035411 | 0.056728 | 0.164476 | 0.005317 | 0.015714 | 0.060502 | 1.000000 | 0.145956 | 0.098356 | 0.005135 |
| people_under_maintenance (con) | 0.003015 | 0.076944 | -0.023834 | 0.097687 | 0.163750 | 0.017142 | 0.033914 | 0.097989 | -0.071207 | 0.284250 | ... | 0.042643 | 0.094770 | 0.118201 | 0.077224 | 0.126136 | 0.109667 | 0.145956 | 1.000000 | 0.014753 | 0.077071 |
| telephone (nom) | 0.000990 | 0.004887 | 0.164718 | 0.003772 | 0.036672 | 0.276995 | 0.006340 | 0.016786 | 0.014413 | 0.005931 | ... | 0.095359 | 0.029368 | 0.145259 | 0.000750 | 0.010187 | 0.065553 | 0.142838 | 0.014753 | 1.000000 | 0.009860 |
| foreign_worker (nom) | 0.025499 | 0.020364 | 0.138196 | 0.025721 | 0.083834 | 0.050050 | 0.005530 | 0.028978 | 0.090024 | 0.014621 | ... | 0.054097 | 0.070038 | 0.006151 | 0.011586 | 0.028568 | 0.009717 | 0.031781 | 0.077071 | 0.042023 | 1.000000 |
21 rows × 21 columns
# We will ignore very weak correlations
#0.00-0.19: very weak
#0.20-0.39: weak
#0.40-0.59: moderate
#0.60-0.79: strong
#0.80-1.00: very strong.
corr_triu = corr_df.where(~np.tril(np.ones(corr_df.shape)).astype(np.bool))
corr_triu = corr_triu.stack()
corr_triu[corr_triu > 0.19]
default (nom) duration_in_month (con) 0.214927
duration_in_month (con) credit_history (nom) 0.194654
purpose (nom) 0.273692
credit_amount (con) 0.624984
property (nom) 0.304274
housing (nom) 0.192174
job (nom) 0.218688
credit_history (nom) credit_amount (con) 0.193283
credits_this_bank (con) 0.595094
purpose (nom) credit_amount (con) 0.370954
credit_amount (con) property (nom) 0.318339
housing (nom) 0.201812
job (nom) 0.334607
telephone (nom) 0.276995
present_emp_since (nom) present_res_since (con) 0.325431
age (con) 0.409607
personal_status_sex (nom) age (con) 0.245809
people_under_maintenance (con) 0.284250
present_res_since (con) property (nom) 0.191575
age (con) 0.266419
housing (nom) 0.307190
property (nom) age (con) 0.224743
age (con) housing (nom) 0.307002
dtype: float64
origCreditDf.columns
Index(['default', 'account_check_status', 'duration_in_month',
'credit_history', 'purpose', 'credit_amount', 'savings',
'present_emp_since', 'installment_as_income_perc',
'personal_status_sex', 'other_debtors', 'present_res_since', 'property',
'age', 'other_installment_plans', 'housing', 'credits_this_bank', 'job',
'people_under_maintenance', 'telephone', 'foreign_worker'],
dtype='object')
#Let's drop irrelevant columns (columns which were part of very weak corelations)
selColumns_CreditDf=origCreditDf.copy()
selColumns_CreditDf=selColumns_CreditDf.drop(["account_check_status","duration_in_month","savings","installment_as_income_perc","other_debtors",
"other_installment_plans","telephone","foreign_worker"],axis=1)
print(selColumns_CreditDf.info())
print(selColumns_CreditDf.columns)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
default 1000 non-null int64
credit_history 1000 non-null object
purpose 1000 non-null object
credit_amount 1000 non-null int64
present_emp_since 1000 non-null object
personal_status_sex 1000 non-null object
present_res_since 1000 non-null int64
property 1000 non-null object
age 1000 non-null int64
housing 1000 non-null object
credits_this_bank 1000 non-null int64
job 1000 non-null object
people_under_maintenance 1000 non-null int64
dtypes: int64(6), object(7)
memory usage: 101.6+ KB
None
Index(['default', 'credit_history', 'purpose', 'credit_amount',
'present_emp_since', 'personal_status_sex', 'present_res_since',
'property', 'age', 'housing', 'credits_this_bank', 'job',
'people_under_maintenance'],
dtype='object')
#we will look into all the boxplot individually to trace out outliers
ax = sns.boxplot(data=selColumns_CreditDf, orient="h")
# Boxplots show presence of outliers as whiskers can be seen. We will treat outlier by using Inter quantile range.
# Let's normalize colmns for age and credit amount using boxcox
from scipy import stats
selColumns_CreditDf['age']= stats.boxcox(selColumns_CreditDf['age'])[0].astype(int)
selColumns_CreditDf['credit_amount']=stats.boxcox(selColumns_CreditDf['credit_amount'])[0].astype(int)
ax = sns.boxplot(data=selColumns_CreditDf, orient="h")
#encoding the categorical variables
encoded_creditdf=pd.get_dummies(selColumns_CreditDf, columns=['credit_history','purpose',
'present_emp_since','personal_status_sex','property','credits_this_bank', 'housing','job'])
print(encoded_creditdf.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 44 columns): default 1000 non-null int64 credit_amount 1000 non-null int32 present_res_since 1000 non-null int64 age 1000 non-null int32 people_under_maintenance 1000 non-null int64 credit_history_all credits at this bank paid back duly 1000 non-null uint8 credit_history_critical account/ other credits existing (not at this bank) 1000 non-null uint8 credit_history_delay in paying off in the past 1000 non-null uint8 credit_history_existing credits paid back duly till now 1000 non-null uint8 credit_history_no credits taken/ all credits paid back duly 1000 non-null uint8 purpose_(vacation - does not exist?) 1000 non-null uint8 purpose_business 1000 non-null uint8 purpose_car (new) 1000 non-null uint8 purpose_car (used) 1000 non-null uint8 purpose_domestic appliances 1000 non-null uint8 purpose_education 1000 non-null uint8 purpose_furniture/equipment 1000 non-null uint8 purpose_radio/television 1000 non-null uint8 purpose_repairs 1000 non-null uint8 purpose_retraining 1000 non-null uint8 present_emp_since_.. >= 7 years 1000 non-null uint8 present_emp_since_... < 1 year 1000 non-null uint8 present_emp_since_1 <= ... < 4 years 1000 non-null uint8 present_emp_since_4 <= ... < 7 years 1000 non-null uint8 present_emp_since_unemployed 1000 non-null uint8 personal_status_sex_female : divorced/separated/married 1000 non-null uint8 personal_status_sex_male : divorced/separated 1000 non-null uint8 personal_status_sex_male : married/widowed 1000 non-null uint8 personal_status_sex_male : single 1000 non-null uint8 property_if not A121 : building society savings agreement/ life insurance 1000 non-null uint8 property_if not A121/A122 : car or other, not in attribute 6 1000 non-null uint8 property_real estate 1000 non-null uint8 property_unknown / no property 1000 non-null uint8 credits_this_bank_1 1000 non-null uint8 credits_this_bank_2 1000 non-null uint8 credits_this_bank_3 1000 non-null uint8 credits_this_bank_4 1000 non-null uint8 housing_for free 1000 non-null uint8 housing_own 1000 non-null uint8 housing_rent 1000 non-null uint8 job_management/ self-employed/ highly qualified employee/ officer 1000 non-null uint8 job_skilled employee / official 1000 non-null uint8 job_unemployed/ unskilled - non-resident 1000 non-null uint8 job_unskilled - resident 1000 non-null uint8 dtypes: int32(2), int64(3), uint8(39) memory usage: 69.4 KB None
# Split Train/Test data 70:30 ratio
from sklearn.model_selection import train_test_split
#separating target column
y = encoded_creditdf['default']
#removing target column from features
X = encoded_creditdf.loc[:, encoded_creditdf.columns != 'default']
#70:30 train test division
X_train, X_test, y_train, y_test = train_test_split( encoded_creditdf, y, test_size=0.3, random_state=42,)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(700, 44) (300, 44) (700,) (300,)
# Randomforest Model without parameter tuning
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=28)
from pprint import pprint
# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rf.get_params())
rf=rf.fit(X_train, y_train)
preds = rf.predict_proba(X_test)[:,1]
y_pred=rf.predict(X_test)
Parameters currently in use:
{'bootstrap': True,
'class_weight': None,
'criterion': 'gini',
'max_depth': None,
'max_features': 'auto',
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'n_estimators': 'warn',
'n_jobs': None,
'oob_score': False,
'random_state': 28,
'verbose': 0,
'warm_start': False}
C:\Users\phlegmatic\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22. "10 in version 0.20 to 100 in 0.22.", FutureWarning)
#calculate Confusion Matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
def calculate_confusion_matrix(y_true, y_pred):
cm=confusion_matrix(y_true, y_pred)
print(cm)
calculate_confusion_matrix(y_test, y_pred)
print(accuracy_score(y_test, y_pred))
[[209 0] [ 2 89]] 0.9933333333333333
# View a list of the features and their importance scores
importances = rf.feature_importances_
indices = np.argsort(importances)[::-1][:15]
a = encoded_creditdf.columns[:]
features= a.drop('default',1)
#plot it
plt.figure(1)
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
plt.yticks(range(len(indices)), features[indices])
plt.xlabel('Relative Importance')
Text(0.5, 0, 'Relative Importance')
As we can see credit amount, credit history delay, age are the important features determined by the model to classify the person profile.
trainResult = rf.score(X_train, y_train)
testResult = rf.score(X_test, y_test)
print("Train Accuracy:",(trainResult*100.0))
print("Test Accuracy:" ,(testResult*100.0))
Train Accuracy: 100.0 Test Accuracy: 99.33333333333333
#Hyper Parameter tuning
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
pprint(random_grid)
{'bootstrap': [True, False],
'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [1, 2, 4],
'min_samples_split': [2, 5, 10],
'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}
# Use the random grid to search for best hyperparameters
# First create the base model to tune
rf = RandomForestClassifier()
# Random search of parameters, using 3 fold cross validation,
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 10, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
rf_random.fit(X_train, y_train)
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 20.6s finished
RandomizedSearchCV(cv=3, error_score='raise-deprecating',
estimator=RandomForestClassifier(bootstrap=True,
class_weight=None,
criterion='gini',
max_depth=None,
max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators='warn',
n_jobs=None,
oob_sc...
param_distributions={'bootstrap': [True, False],
'max_depth': [10, 20, 30, 40, 50, 60,
70, 80, 90, 100, 110,
None],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [1, 2, 4],
'min_samples_split': [2, 5, 10],
'n_estimators': [200, 400, 600, 800,
1000, 1200, 1400, 1600,
1800, 2000]},
pre_dispatch='2*n_jobs', random_state=42, refit=True,
return_train_score=False, scoring=None, verbose=2)
#best parameters from fitting the random search:
rf_random.best_params_
{'n_estimators': 200,
'min_samples_split': 10,
'min_samples_leaf': 2,
'max_features': 'sqrt',
'max_depth': 50,
'bootstrap': True}
#Evaluate Random Search
#To determine if random search yielded a better model, we compare the base model with the best random search model.
def evaluate(model, test_features, test_labels):
predictions = model.predict(test_features)
errors = abs(predictions - test_labels)
mape = 100 * np.mean(errors / test_labels)
accuracy = 100 - mape
print('Model Performance')
print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
print('Accuracy = {:0.2f}%.'.format(accuracy))
return accuracy
base_model = RandomForestClassifier(n_estimators = 10, random_state = 42)
base_model.fit(X_train, y_train)
base_accuracy = evaluate(base_model, X_test, y_test)
Model Performance Average Error: 0.0167 degrees. Accuracy = 94.51%.
best_random = rf_random.best_estimator_
random_accuracy = evaluate(best_random, X_test, y_test)
print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy - base_accuracy) / base_accuracy))
Model Performance Average Error: 0.0000 degrees. Accuracy = 100.00%. Improvement of 5.81%.
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search using 3 folds
param_grid = {
'bootstrap': [True],
'max_depth': [40, 50, 60],
'max_features': [2, 3],
'min_samples_leaf': [2,3,4],
'min_samples_split': [8, 10, 12],
'n_estimators': [100,200]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model with kfold where k=3
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
# Fit the grid search to the data
grid_search.fit(X_train, y_train)
grid_search.best_params_
Fitting 3 folds for each of 108 candidates, totalling 324 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 2.3s [Parallel(n_jobs=-1)]: Done 154 tasks | elapsed: 12.1s [Parallel(n_jobs=-1)]: Done 324 out of 324 | elapsed: 24.9s finished C:\Users\phlegmatic\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal. DeprecationWarning)
{'bootstrap': True,
'max_depth': 50,
'max_features': 3,
'min_samples_leaf': 2,
'min_samples_split': 8,
'n_estimators': 200}
best_grid = grid_search.best_estimator_
grid_accuracy = evaluate(best_grid, X_test, y_test)
print('Improvement of {:0.2f}%.'.format( 100 * (grid_accuracy - base_accuracy) / base_accuracy))
Model Performance Average Error: 0.0033 degrees. Accuracy = 98.90%. Improvement of 4.65%.
num_folds = 5
seed = 28
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=num_folds, random_state=seed)
results = cross_val_score(best_grid, X_train, y_train, cv=kfold)
print(results)
print("All column cross_val_score: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
[0.99285714 0.98571429 1. 1. 0.99285714] All column cross_val_score: 99.429% (0.535%)
Because we have normalized key column like age, credit_amount using boxcox method. Also the weak corelations between categorical columns allowed us to drop certain columns and make our model simpler.
Hyper parameter tuning using random search as well as GridSearchCV improved the accuracy of our model further. K-fold validation gives 99% accuracy to our model which gives us lot of confidence to classify the person profile as good credit or bad credit.